Senior Site Reliability Engineer II
Role details
Job location
Tech stack
Job description
We are hiring a hands-on Senior Site Reliability Engineer (SRE) to actively build, operate, and improve the reliability of our production systems. This is not a purely advisory role you will be directly involved in designing infrastructure, writing Terraform, improving observability, and responding to real production incidents., * Design, build, and operate highly available, scalable systems in AWS
- Write, maintain, and review Terraform to provision and manage infrastructure
- Own and improve monitoring, alerting, and observability using Grafana, Pingdom, and Uptrends
- Participate in a rotating on-call schedule, responding to production incidents and driving issues to resolution
- Lead incident response, root cause analysis, and post-incident reviews with a focus on prevention and automation
- Define and manage SLOs, SLIs, and error budgets
- Build and improve CI/CD pipelines and operational workflows using Azure DevOps and GitHub
- Work directly with application teams to improve reliability, performance, and deployability
- Automate manual operational tasks to reduce toil
- Maintain clear, actionable runbooks and documentation in Confluence
- Track work, incidents, and operational improvements using Jira and ServiceNow
- Mentor other engineers and help set SRE standards and best practices
Requirements
- 5+ years of hands-on experience in SRE, DevOps, or Infrastructure Engineering roles
- Strong production experience in AWS
- Required: Significant hands-on experience with Terraform in real-world environments
- Experience operating monitoring and uptime platforms such as Grafana, Pingdom, and Uptrends
- Strong Linux systems, networking, and troubleshooting skills
- Experience supporting production systems through incident response and on-call rotations
- Proficiency with GitHub and modern Git workflows
- Experience building or maintaining CI/CD pipelines with Azure DevOps
- Familiarity with ITSM and incident workflows using ServiceNow
- Strong written communication skills with experience documenting systems and processes in Confluence
- Ability to work independently in a remote or hybrid environment, * Experience defining and operating against SLOs and error budgets
- Infrastructure-as-Code best practices beyond Terraform (modules, testing, CI integration)
- Experience with containers and orchestration (Docker, Kubernetes)
- Experience supporting large-scale, high-availability production systems
- Prior experience mentoring engineers or serving as a technical lead
Benefits & conditions
- Competitive salary and comprehensive benefits
- Flexible work location with hybrid or fully remote options
- Real ownership of production systems and reliability outcomes
- A culture that values automation, learning, and continuous improvement
U.S. National Base Pay Range: $104,900 - $174,700. Geographic differentials may apply in some locations to better reflect local market rates.
Base Pay Range for CO is $104,900 - $174,700. Base Pay Range for IL is $110,100 - $183,500. Base Pay Range for Chicago, IL is $115,400 - $192,200. Base Pay Range for MD is $110,100 - $183,500. Base Pay Range for NY is $115,400 - $192,200. Base Pay Range for New York City is $125,900 - $209,700. Base Pay Range for Rochester, NY is $104,900 - $174,700. Base Pay Range for OH is $99,700 - $166,000. Base Pay Range for NJ is $123,816- $197,784.