Site Reliability Engineer

TEKSYSTEMS INC.
Chandler, United States of America
6 days ago

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 146K

Job location

Chandler, United States of America

Tech stack

Java
.NET
Application Performance Management
Cloud Computing
Cloud Engineering
Configuration Management
Computer Programming
Continuous Integration
Software Debugging
Linux
DevOps
HP SiteScope
Tivoli Management Framework
Python
System Center Configuration Manager
Performance Tuning
Reliability Engineering
Cloud Services
Ansible
Runbook
Tableau
Data Logging
Scripting (Bash/Python/Go/Ruby)
Cloud Platform System
BladeLogic
Software Troubleshooting
Infrastructure Automation Frameworks
Information Technology
Terraform
Splunk
Ansible Tower
Dynatrace
ServiceNow
Artifactory

Job description

This role is responsible for ensuring the availability, performance, resiliency, and scalability of critical platforms supporting enterprise security, automation, orchestration, CI/CD pipelines, and cloud-based services. The Sr SRE will partner closely with Product Managers, Engineering Leads, platform teams, and other SREs to design, build, operate, and continuously improve highly reliable enterprise solutions. Key Responsibilities Provide 24x7 production support (including on-call rotation) for multiple enterprise platforms and automation solutions. Ensure stability, availability, performance, and resiliency of enterprise security, automation, orchestration, and CI/CD platforms. Act as a Senior escalation point for incident management, root cause analysis, and problem resolution. Lead and contribute to post-incident reviews, documenting root causes and driving corrective and preventive actions. Collaborate with Product, Engineering, and Architecture teams to influence design decisions that improve reliability, scalability, and operability. Automate repetitive operational tasks and implement self-healing and auto-remediation solutions using infrastructure-as-code and workflow automation. Support and operate enterprise tools including, but not limited to: o Security & Endpoint Platforms: Tanium, CrowdStrike o Automation & Configuration Management: Ansible, Ansible Tower, Terraform, BMC Bladelogic o Orchestration & Workflow: BMC TrueSight Orchestrator o CI/CD & Artifact Management: JFrog Artifactory and Xray o Endpoint Management: Microsoft SCCM Monitor system health and performance using enterprise monitoring, logging, and APM tools. Partner with service management teams to manage incidents, changes, and problem records using ServiceNow and BMC Remedy. Create and maintain operational documentation, runbooks, dashboards, and standard operating procedures. Contribute to capacity planning, performance tuning, and platform modernization efforts. Support cloud-based platforms and hybrid environments with a reliability-first mindset.

Requirements

our client is seeking an experienced Senior Site Reliability Engineer (Sr SRE) to provide production support and reliability engineering for multiple enterprise-wide solutions within the Infrastructure Automation Solutions (IAS) organization., 5+ years of experience supporting enterprise-scale production systems with a focus on reliability, operations, and automation. Senior-level experience (5-10 years) supporting one or more of the following: o Enterprise security platforms o Automation and orchestration solutions o Workflow automation o CI/CD pipelines o Cloud platforms (public or private) Strong hands-on experience with: o Linux and Windows administration o Ansible / Ansible Tower o Terraform Proficiency in at least one or more programming or scripting languages: o Python o .NET o Java Experience supporting and troubleshooting enterprise logging and monitoring platforms, such as: o Tivoli ITM o SiteScope o Splunk Experience with Dynatrace Application Performance Monitoring (APM). Experience with ITSM tools, including ServiceNow and/or BMC Remedy. Strong troubleshooting, debugging, and root-cause analysis skills in complex enterprise environments. Proven ability to collaborate across teams and communicate effectively with both technical and non-technical stakeholders. Preferred Qualifications Experience supporting large-scale financial services or regulated enterprise environments. Familiarity with cloud-native architectures and DevOps/SRE best practices. Experience developing dashboards and operational insights using Tableau. Exposure to SRE concepts including SLIs, SLOs, error budgets, and reliability metrics. Experience improving operational maturity through automation, standardization, and observability. Bachelor's degree in Computer Science, Engineering, or a related discipline (or equivalent experience). Experience Level Expert Level

Benefits & conditions

This is a Contract position based out of Chandler, AZ. Pay and Benefits The pay range for this position is $70.24 - $70.24/hr. Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, the benefits available for this temporary role may include the following: Medical, dental & vision Critical Illness, Accident, and Hospital 401(k) Retirement Plan - Pre-tax and Roth post-tax contributions available Life Insurance (Voluntary Life & AD&D for the employee and dependents) Short and long-term disability Health Spending Account (HSA) Transportation benefits Employee Assistance Program Time Off/Leave (PTO, Vacation or Sick Leave)

About the company

We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company. The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law. About TEKsystems and TEKsystems Global Services We're a leading provider of business and technology services. We accelerate business transformation for our customers. Our expertise in strategy, design, execution and operations unlocks business value through a range of solutions. We're a team of 80,000 strong, working with over 6,000 customers, including 80% of the Fortune 500 across North America, Europe and Asia, who partner with us for our scale, full-stack capabilities and speed. We're strategic thinkers, hands-on collaborators, helping customers capitalize on change and master the momentum of technology. We're building tomorrow by delivering business outcomes and making positive impacts in our global communities. TEKsystems and TEKsystems Global Services are Allegis Group companies. Learn more at TEKsystems.com.

Apply for this position