Site Reliability Engineer
EVER FORTH LLC
Charlotte, United States of America
2 days ago
Role details
Contract type
Temporary to permanent Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Senior Compensation
$ 154KJob location
Charlotte, United States of America
Tech stack
Artificial Intelligence
Computing Platforms
Bash
Continuous Integration
Middleware
Python
Automation of Marketing
Openshift
Powershell
Reliability Engineering
Site Reliability Engineering Practices
Ansible
Datadog
Mttr
Containerization
Infrastructure Automation Frameworks
Job description
- Provide senior-level L2/L3 application and middleware production support, leading disciplined troubleshooting, recovery, and stabilization for complex, high-availability services.
- Embed SRE practices into daily operations, including defining reliability signals, improving alert quality, and driving blameless post-incident learning to reduce toil.
- Implement and improve observability across applications and middleware (logs, metrics, traces, dashboards) to improve detection, diagnosis, and Mean Time to Resolution (MTTR).
- Design, develop, and maintain infrastructure-as-code and configuration-as-code capabilities for VM-based and container-adjacent workloads, including OpenShift (OCP).
- Build and support automation for operational actions across middleware components (e.g., standardized status checks, start/stop/restart) to enable safer self-service.
- Integrate operational automation with CI/CD pipelines to enable repeatable and auditable deployments.
- Monitor for configuration drift, support automated compliance checks, and implement remediation patterns aligned with enterprise controls.
- Develop and maintain runbooks and operational documentation for automation and platform procedures.
- Participate in on-call rotations and provide operational support coverage as required, including outside of normal working hours.
Requirements
- 5+ years of combined Software, Systems, or Infrastructure Engineering experience supporting enterprise production environments.
- Senior-level L2/L3 application and middleware production support experience.
- A strong Site Reliability Engineering (SRE) mindset, with an emphasis on proactive reliability, MTTR reduction, and toil elimination.
- Demonstrated experience designing and operating observability solutions, including logs, metrics, traces, dashboards, and actionable alerting.
- Hands-on automation and scripting skills using tools such as Python, Bash, or PowerShell.
- Experience working in VM-based and container-adjacent environments, including OpenShift (OCP).
- Ability to understand application and platform architecture, identify bottlenecks, and engineer solutions., * Experience with Infrastructure-as-Code (IaC) and configuration-as-code, preferably with Ansible or similar automation platforms.
- Experience with CI/CD-integrated operational automation.
- Familiarity with cloud or container platforms (any provider).
- Exposure to AI-assisted operations with appropriate guardrails.
- Knowledge of Elastic or legacy observability tooling.
- Strong cross-functional communication skills and experience operating in regulated environments.
Benefits & conditions
The anticipated pay range for this position is $69.00/hr to $74.00/hr.
About the company
Everforth Apex is a world-class IT services company that serves thousands of clients across the globe. When you join Everforth Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing in Talent Satisfaction in the United States and Great Place to Work in the United Kingdom and Mexico. Everforth Apex uses a virtual recruiter as part of the application process. Click for more details.