Senior Site Reliability Engineer

Yoh Services LLC

New York, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 150K

Job location

New York, United States of America

Tech stack

Artificial Intelligence

Amazon Web Services (AWS)

Systems Engineering

Bash

Cloud Computing

Cloud Engineering

Computer Programming

Databases

DevOps

DNS

Monitoring of Systems

Python

MongoDB

Nagios

NoSQL

Reliability Engineering

Ansible

Prometheus

SQL Databases

Google Cloud Platform

Load Balancing

Delivery Pipeline

Snowflake

Grafana

Gitlab

Kubernetes

Infrastructure Automation Frameworks

Deployment Automation

Vertica

Terraform

Oracle Cloud Infrastructure

Splunk

Data Pipelines

Automation Anywhere

Job description

focused only on Kubernetes, AWS, or infrastructure administration. The hiring manager is seeking a senior-level SRE/Infrastructure professional who understands the broader infrastructure ecosystem, production operations, observability, reliability engineering, and business impact of the systems they support. Key Responsibilities Infrastructure & Reliability Engineering

Own and maintain production services and infrastructure.
Ensure platform availability, reliability, scalability, and performance.
Monitor and troubleshoot infrastructure across cloud and on-prem environments.
Take ownership of services end-to-end rather than only supporting individual technologies.
Participate in incident response and production issue management.

Observability & Monitoring

Design, build, and maintain monitoring solutions.
Create and manage:

SLIs (Service Level Indicators)
SLOs (Service Level Objectives)
SLAs (Service Level Agreements)

Build monitoring dashboards and reliability metrics in Grafana.
Measure system health, traffic, performance, error rates, and resource utilization.

Production Operations

Participate in on-call rotation (every 3 weeks).
Respond to production incidents and service outages.
Coordinate with global teams during incidents.
Drive root-cause analysis and service improvements.

Cross-Functional Collaboration

Work independently across multiple teams.
Drive initiatives from inception to completion.
Coordinate with engineering, platform, infrastructure, and operations teams.
Operate in an agile/sprint-based environment with strong accountability.

AI & Automation

Demonstrate practical understanding of AI beyond basic prompting.
Understand:

AI-assisted automation
AI SDK deployment
MCP (Model Context Protocol)
AI workflows and operational use cases

Leverage AI to improve infrastructure automation and operational efficiency.

Requirements

4+ years in Site Reliability Engineering, DevOps, or related operational roles with proven experience in Linux/Unix systems administration proficiency in scripting and programming languages such as Python, Bash, or Go for automation and tool development
Strong experience with cloud infrastructure and services across GCP, AWS, and OCI, as well as container orchestration tools like Kubernetes
Expertise in monitoring and observability tools such as Prometheus, Grafana, Splunk, Nagios,
Hands-on experience with Infrastructure-as-Code tools like Terraform, Ansible, or Helm
Proven ability to develop and track SLIs, SLOs, and SLAs to drive reliability improvements

Technical Knowledge

Deep understanding of networking, DNS, load balancing, and CDN technologies
Familiarity with databases (SQL, NoSQL, Vertica, MongoDB, Snowflake) and data pipeline technologies
Knowledge of CI/CD pipelines, GitLab, and deployment automation
Experience with workflow automation platforms is a strong plus

Benefits & conditions

Estimated Min Rate: $130000.00 Estimated Max Rate: $150000.00

What's In It for You?

We welcome you to be a part of the largest and legendary global staffing companies to meet your career aspirations. Yoh's network of client companies has been employing professionals like you for over 65 years in the U.S., UK and Canada. Join Yoh's extensive talent community that will provide you with access to Yoh's vast network of opportunities and gain access to this exclusive opportunity available to you. Benefit eligibility is in accordance with applicable laws and client requirements. Benefits include:

Medical, Prescription, Dental & Vision Benefits (for employees working 20+ hours per week)
Health Savings Account (HSA) (for employees working 20+ hours per week)
Life & Disability Insurance (for employees working 20+ hours per week)
MetLife Voluntary Benefits
Employee Assistance Program (EAP)
401K Retirement Savings Plan
Direct Deposit & weekly epayroll
Referral Bonus Programs
Certification and training opportunities

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

Apply for this position

Good distractions

Moments

Videos View all