Site Reliability Engineer

TEKSYSTEMS INC.

Chandler, United States of America

6 days ago

Role details

Contract type

Temporary contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 146K

Job location

Chandler, United States of America

Tech stack

Java

.NET

Application Performance Management

Cloud Computing

Cloud Engineering

Configuration Management

Computer Programming

Continuous Integration

Software Debugging

Linux

DevOps

HP SiteScope

Tivoli Management Framework

Python

System Center Configuration Manager

Performance Tuning

Reliability Engineering

Cloud Services

Ansible

Runbook

Tableau

Data Logging

Scripting (Bash/Python/Go/Ruby)

Cloud Platform System

BladeLogic

Software Troubleshooting

Infrastructure Automation Frameworks

Information Technology

Terraform

Splunk

Ansible Tower

Dynatrace

ServiceNow

Artifactory

Job description

This role is responsible for ensuring the availability, performance, resiliency, and scalability of critical platforms supporting enterprise security, automation, orchestration, CI/CD pipelines, and cloud-based services. The Sr SRE will partner closely with Product Managers, Engineering Leads, platform teams, and other SREs to design, build, operate, and continuously improve highly reliable enterprise solutions. Key Responsibilities Provide 24x7 production support (including on-call rotation) for multiple enterprise platforms and automation solutions. Ensure stability, availability, performance, and resiliency of enterprise security, automation, orchestration, and CI/CD platforms. Act as a Senior escalation point for incident management, root cause analysis, and problem resolution. Lead and contribute to post-incident reviews, documenting root causes and driving corrective and preventive actions. Collaborate with Product, Engineering, and Architecture teams to influence design decisions that improve reliability, scalability, and operability. Automate repetitive operational tasks and implement self-healing and auto-remediation solutions using infrastructure-as-code and workflow automation. Support and operate enterprise tools including, but not limited to: o Security & Endpoint Platforms: Tanium, CrowdStrike o Automation & Configuration Management: Ansible, Ansible Tower, Terraform, BMC Bladelogic o Orchestration & Workflow: BMC TrueSight Orchestrator o CI/CD & Artifact Management: JFrog Artifactory and Xray o Endpoint Management: Microsoft SCCM Monitor system health and performance using enterprise monitoring, logging, and APM tools. Partner with service management teams to manage incidents, changes, and problem records using ServiceNow and BMC Remedy. Create and maintain operational documentation, runbooks, dashboards, and standard operating procedures. Contribute to capacity planning, performance tuning, and platform modernization efforts. Support cloud-based platforms and hybrid environments with a reliability-first mindset.

Requirements

our client is seeking an experienced Senior Site Reliability Engineer (Sr SRE) to provide production support and reliability engineering for multiple enterprise-wide solutions within the Infrastructure Automation Solutions (IAS) organization., 5+ years of experience supporting enterprise-scale production systems with a focus on reliability, operations, and automation. Senior-level experience (5-10 years) supporting one or more of the following: o Enterprise security platforms o Automation and orchestration solutions o Workflow automation o CI/CD pipelines o Cloud platforms (public or private) Strong hands-on experience with: o Linux and Windows administration o Ansible / Ansible Tower o Terraform Proficiency in at least one or more programming or scripting languages: o Python o .NET o Java Experience supporting and troubleshooting enterprise logging and monitoring platforms, such as: o Tivoli ITM o SiteScope o Splunk Experience with Dynatrace Application Performance Monitoring (APM). Experience with ITSM tools, including ServiceNow and/or BMC Remedy. Strong troubleshooting, debugging, and root-cause analysis skills in complex enterprise environments. Proven ability to collaborate across teams and communicate effectively with both technical and non-technical stakeholders. Preferred Qualifications Experience supporting large-scale financial services or regulated enterprise environments. Familiarity with cloud-native architectures and DevOps/SRE best practices. Experience developing dashboards and operational insights using Tableau. Exposure to SRE concepts including SLIs, SLOs, error budgets, and reliability metrics. Experience improving operational maturity through automation, standardization, and observability. Bachelor's degree in Computer Science, Engineering, or a related discipline (or equivalent experience). Experience Level Expert Level

Benefits & conditions

This is a Contract position based out of Chandler, AZ. Pay and Benefits The pay range for this position is $70.24 - $70.24/hr. Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, the benefits available for this temporary role may include the following: Medical, dental & vision Critical Illness, Accident, and Hospital 401(k) Retirement Plan - Pre-tax and Roth post-tax contributions available Life Insurance (Voluntary Life & AD&D for the employee and dependents) Short and long-term disability Health Spending Account (HSA) Transportation benefits Employee Assistance Program Time Off/Leave (PTO, Vacation or Sick Leave)

About the company

We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company. The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law. About TEKsystems and TEKsystems Global Services We're a leading provider of business and technology services. We accelerate business transformation for our customers. Our expertise in strategy, design, execution and operations unlocks business value through a range of solutions. We're a team of 80,000 strong, working with over 6,000 customers, including 80% of the Fortune 500 across North America, Europe and Asia, who partner with us for our scale, full-stack capabilities and speed. We're strategic thinkers, hands-on collaborators, helping customers capitalize on change and master the momentum of technology. We're building tomorrow by delivering business outcomes and making positive impacts in our global communities. TEKsystems and TEKsystems Global Services are Allegis Group companies. Learn more at TEKsystems.com.