Site Reliability Engineer
Role details
Job location
Tech stack
Job description
This role is responsible for ensuring the availability, performance, resiliency, and scalability of critical platforms supporting enterprise security, automation, orchestration, CI/CD pipelines, and cloud-based services. The Sr SRE will partner closely with Product Managers, Engineering Leads, platform teams, and other SREs to design, build, operate, and continuously improve highly reliable enterprise solutions. Key Responsibilities Provide 24x7 production support (including on-call rotation) for multiple enterprise platforms and automation solutions. Ensure stability, availability, performance, and resiliency of enterprise security, automation, orchestration, and CI/CD platforms. Act as a Senior escalation point for incident management, root cause analysis, and problem resolution. Lead and contribute to post-incident reviews, documenting root causes and driving corrective and preventive actions. Collaborate with Product, Engineering, and Architecture teams to influence design decisions that improve reliability, scalability, and operability. Automate repetitive operational tasks and implement self-healing and auto-remediation solutions using infrastructure-as-code and workflow automation. Support and operate enterprise tools including, but not limited to: o Security & Endpoint Platforms: Tanium, CrowdStrike o Automation & Configuration Management: Ansible, Ansible Tower, Terraform, BMC Bladelogic o Orchestration & Workflow: BMC TrueSight Orchestrator o CI/CD & Artifact Management: JFrog Artifactory and Xray o Endpoint Management: Microsoft SCCM Monitor system health and performance using enterprise monitoring, logging, and APM tools. Partner with service management teams to manage incidents, changes, and problem records using ServiceNow and BMC Remedy. Create and maintain operational documentation, runbooks, dashboards, and standard operating procedures. Contribute to capacity planning, performance tuning, and platform modernization efforts. Support cloud-based platforms and hybrid environments with a reliability-first mindset.
Requirements
our client is seeking an experienced Senior Site Reliability Engineer (Sr SRE) to provide production support and reliability engineering for multiple enterprise-wide solutions within the Infrastructure Automation Solutions (IAS) organization., 5+ years of experience supporting enterprise-scale production systems with a focus on reliability, operations, and automation. Senior-level experience (5-10 years) supporting one or more of the following: o Enterprise security platforms o Automation and orchestration solutions o Workflow automation o CI/CD pipelines o Cloud platforms (public or private) Strong hands-on experience with: o Linux and Windows administration o Ansible / Ansible Tower o Terraform Proficiency in at least one or more programming or scripting languages: o Python o .NET o Java Experience supporting and troubleshooting enterprise logging and monitoring platforms, such as: o Tivoli ITM o SiteScope o Splunk Experience with Dynatrace Application Performance Monitoring (APM). Experience with ITSM tools, including ServiceNow and/or BMC Remedy. Strong troubleshooting, debugging, and root-cause analysis skills in complex enterprise environments. Proven ability to collaborate across teams and communicate effectively with both technical and non-technical stakeholders. Preferred Qualifications Experience supporting large-scale financial services or regulated enterprise environments. Familiarity with cloud-native architectures and DevOps/SRE best practices. Experience developing dashboards and operational insights using Tableau. Exposure to SRE concepts including SLIs, SLOs, error budgets, and reliability metrics. Experience improving operational maturity through automation, standardization, and observability. Bachelor's degree in Computer Science, Engineering, or a related discipline (or equivalent experience). Experience Level Expert Level
Benefits & conditions
This is a Contract position based out of Chandler, AZ. Pay and Benefits The pay range for this position is $70.24 - $70.24/hr. Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, the benefits available for this temporary role may include the following: Medical, dental & vision Critical Illness, Accident, and Hospital 401(k) Retirement Plan - Pre-tax and Roth post-tax contributions available Life Insurance (Voluntary Life & AD&D for the employee and dependents) Short and long-term disability Health Spending Account (HSA) Transportation benefits Employee Assistance Program Time Off/Leave (PTO, Vacation or Sick Leave)