Enterprise Site Reliability Engineer

Everforth Apex
Saint Davids, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Saint Davids, United States of America

Tech stack

Amazon Web Services (AWS)
Applications Architecture
Cloud Computing
Cloud Engineering
Computer Networks
Distributed Systems
Python
Powershell
Reliability Engineering
Google Cloud Platform
Enterprise Software Applications
Software Troubleshooting
Multi-Cloud
Reliability of Systems

Job description

  • Lead and conduct enterprise resiliency reviews across applications, platforms, vendors, and infrastructure.
  • Assess enterprise applications and systems, identify gaps, document remediation actions, and drive teams to closure on resiliency improvements.
  • Drive automation initiatives to reduce manual review processes and build scalable, repeatable resiliency frameworks.
  • Support critical incidents by providing senior-level troubleshooting and guidance for complex issues.
  • Identify recurring reliability patterns and recommend proactive, long-term solutions, including process improvements and new tools.
  • Influence technology teams across the organization to adopt best practices and resiliency standards.
  • Contribute to the long-term strategy for enterprise reliability and resiliency, with a focus on moving towards a continuous recertification model.
  • Partner with engineering and platform teams to improve system reliability and operational practices.

Requirements

  • Extensive experience in Site Reliability Engineering, Infrastructure Engineering, or related disciplines with a strong background in system reliability, resiliency, and production support.
  • Experience conducting architectural, operational, or resiliency assessments in large, complex enterprise environments.
  • Deep troubleshooting and debugging skills across complex distributed systems.
  • Experience with automation and scripting (e.g., Python, PowerShell, or similar).
  • Strong understanding of cloud platforms (AWS required), infrastructure, networking concepts, application architectures, and system dependencies.
  • Proven ability to influence and drive change across cross-functional teams, navigate resistance, and manage stakeholder conversations.
  • Ability to operate at both a hands-on and strategic level., * Experience in multi-cloud environments, particularly with Google Cloud (Google Cloud Platform).
  • Strong background in infrastructure, networking, or hardware systems.
  • Experience working with both cloud-native and legacy/monolithic applications, including on-prem systems.
  • Familiarity with observability, telemetry, and monitoring platforms.
  • Experience identifying and implementing enterprise-scale reliability improvements.

About the company

Everforth Apex is a world-class IT services company that serves thousands of clients across the globe. When you join Everforth Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRateds Best of Staffing in Talent Satisfaction in the United States and Great Place to Work in the United Kingdom and Mexico. Everforth Apex uses a virtual recruiter as part of the application process. Click for more details. By applying for this job, you agree to receive calls, AI-generated calls, text messages, or emails from Everforth Apex and its affiliates, and contracted partners. Frequency varies for text messages. Message and data rates may apply. Carriers are not liable for delayed or undelivered messages. You can reply STOP to cancel and HELP for help. You can access our privacy policy at

Apply for this position