Senior Site Reliability Engineer

Yoh Services LLC
New York, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 150K

Job location

New York, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Systems Engineering
Bash
Cloud Computing
Cloud Engineering
Computer Programming
Databases
DevOps
DNS
Monitoring of Systems
Python
MongoDB
Nagios
NoSQL
Reliability Engineering
Ansible
Prometheus
SQL Databases
Google Cloud Platform
Load Balancing
Delivery Pipeline
Snowflake
Grafana
Gitlab
Kubernetes
Infrastructure Automation Frameworks
Deployment Automation
Vertica
Terraform
Oracle Cloud Infrastructure
Splunk
Data Pipelines
Automation Anywhere

Job description

focused only on Kubernetes, AWS, or infrastructure administration. The hiring manager is seeking a senior-level SRE/Infrastructure professional who understands the broader infrastructure ecosystem, production operations, observability, reliability engineering, and business impact of the systems they support. Key Responsibilities Infrastructure & Reliability Engineering

  • Own and maintain production services and infrastructure.
  • Ensure platform availability, reliability, scalability, and performance.
  • Monitor and troubleshoot infrastructure across cloud and on-prem environments.
  • Take ownership of services end-to-end rather than only supporting individual technologies.
  • Participate in incident response and production issue management.

Observability & Monitoring

  • Design, build, and maintain monitoring solutions.
  • Create and manage:
  • SLIs (Service Level Indicators)
  • SLOs (Service Level Objectives)
  • SLAs (Service Level Agreements)
  • Build monitoring dashboards and reliability metrics in Grafana.
  • Measure system health, traffic, performance, error rates, and resource utilization.

Production Operations

  • Participate in on-call rotation (every 3 weeks).
  • Respond to production incidents and service outages.
  • Coordinate with global teams during incidents.
  • Drive root-cause analysis and service improvements.

Cross-Functional Collaboration

  • Work independently across multiple teams.
  • Drive initiatives from inception to completion.
  • Coordinate with engineering, platform, infrastructure, and operations teams.
  • Operate in an agile/sprint-based environment with strong accountability.

AI & Automation

  • Demonstrate practical understanding of AI beyond basic prompting.
  • Understand:
  • AI-assisted automation
  • AI SDK deployment
  • MCP (Model Context Protocol)
  • AI workflows and operational use cases
  • Leverage AI to improve infrastructure automation and operational efficiency.

Requirements

  • 4+ years in Site Reliability Engineering, DevOps, or related operational roles with proven experience in Linux/Unix systems administration proficiency in scripting and programming languages such as Python, Bash, or Go for automation and tool development

  • Strong experience with cloud infrastructure and services across GCP, AWS, and OCI, as well as container orchestration tools like Kubernetes

  • Expertise in monitoring and observability tools such as Prometheus, Grafana, Splunk, Nagios,

  • Hands-on experience with Infrastructure-as-Code tools like Terraform, Ansible, or Helm

  • Proven ability to develop and track SLIs, SLOs, and SLAs to drive reliability improvements

Technical Knowledge

  • Deep understanding of networking, DNS, load balancing, and CDN technologies
  • Familiarity with databases (SQL, NoSQL, Vertica, MongoDB, Snowflake) and data pipeline technologies
  • Knowledge of CI/CD pipelines, GitLab, and deployment automation
  • Experience with workflow automation platforms is a strong plus

Benefits & conditions

Estimated Min Rate: $130000.00 Estimated Max Rate: $150000.00

What's In It for You?

We welcome you to be a part of the largest and legendary global staffing companies to meet your career aspirations. Yoh's network of client companies has been employing professionals like you for over 65 years in the U.S., UK and Canada. Join Yoh's extensive talent community that will provide you with access to Yoh's vast network of opportunities and gain access to this exclusive opportunity available to you. Benefit eligibility is in accordance with applicable laws and client requirements. Benefits include:

  • Medical, Prescription, Dental & Vision Benefits (for employees working 20+ hours per week)
  • Health Savings Account (HSA) (for employees working 20+ hours per week)
  • Life & Disability Insurance (for employees working 20+ hours per week)
  • MetLife Voluntary Benefits
  • Employee Assistance Program (EAP)
  • 401K Retirement Savings Plan
  • Direct Deposit & weekly epayroll
  • Referral Bonus Programs
  • Certification and training opportunities

Apply for this position