Site Reliability Engineer
Role details
Job location
Tech stack
Job description
We are seeking a Site Reliability Engineer to join a large-scale enterprise infrastructure organization within the financial services industry. This team is responsible for ensuring the reliability, scalability, and resilience of thousands of production systems supporting critical business functions.
This role blends systems engineering, software development, and operations excellence, with a strong emphasis on automation, infrastructure as code, observability, and incident response. You will work in a highly collaborative SRE environment focused on building reliability into platforms rather than reacting to failures and work towards driving operations excellence, automation, and resiliency at scale. The role focuses on building reliable services through infrastructure as code, observability, and chaos testing while guiding developers and improving production insights. The Production Services team supports a broad application portfolio with follow-the-sun on-call rotation and delivers platform, application, batch, cloud, UI, middle tier, database, mainframe, release, and performance engineering services.
Due to client requirements, applicants must be willing and able to work on a w2 basis. For our w2 consultants, we offer a great benefits package that includes Medical, Dental, and Vision benefits, 401k with company matching, and life insurance.
Rate: $60.00 to $65.00/hr. w2
Responsibilities:
- Design, build, and operate highly available, resilient infrastructure at enterprise scale
- Drive automation-first solutions across operations, incident management, and environment management
- Support production systems through a roster-based on-call rotation (follow-the-sun model)
- Lead and participate in incident response, triage, and root cause analysis under pressure
- Implement and enhance observability solutions including monitoring, logging, metrics, and alerting
- Build and maintain infrastructure as code for cloud and platform services
- Partner closely with application and platform teams to provide production insights and developer guidance
- Continuously improve system reliability using resiliency engineering, chaos testing, and performance engineering practices
Requirements
- 5+ years of experience supporting or building large-scale, multi-tiered distributed systems
- 1-2 years of cloud development or cloud migration experience
- 2-4 years of software development experience with a focus on automation and SDLC practices
- Prior on-call experience supporting production systems and running incidents
- Strong SRE, systems engineering, or software engineering background
- Hands-on experience with AWS cloud environments (EKS preferred); Azure (AKS) experience is a plus
- Solid Kubernetes experience in production environments
- Experience with infrastructure as code tools such as Terraform, Ansible, Chef, IAM, or ARM
- Proficiency in automation and scripting (Python, Shell scripting, Node.js, JavaScript, or Java)
- CI/CD pipeline experience using tools such as Jenkins and Groovy
- Proven experience supporting distributed, highly concurrent, service-based architectures
- Hands-on experience with observability platforms such as Datadog, Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, or Splunk
- Experience managing and analyzing large datasets, metrics, and logs to improve system reliability
- Demonstrated ability to maintain scalability and resiliency in complex environments
What Sets You Apart
- Strong systems-thinking mindset with a passion for reliability and automation
- Comfortable operating in high-pressure production environments
- Excellent communication skills with the ability to engage both technical and non-technical partners
- Proven ability to learn new tools and practices and introduce them effectively to engineering teams
- Collaborative approach to working with diverse teams across locations and time zones
Education Requirements:
Bachelor's degree in a technology-related field or equivalent experience. Master's degree is a plus.
Benefits & conditions
Skills, experience, and other compensable factors will be considered when determining pay rate. The pay range provided in this posting reflects a W2 hourly rate; other employment options may be available that may result in pay outside of the provided range.
W2 employees of Eliassen Group who are regularly scheduled to work 30 or more hours per week are eligible for the following benefits: medical (choice of 3 plans), dental, vision, pre-tax accounts, other voluntary benefits including life and disability insurance, 401(k) with match, and sick time if required by law in the worked-in state/locality.
If anyone reaches out to you about an open position connected with Eliassen Group, please ensure that you are working directly with us by confirming the following:
· When you work with Eliassen Group, all email communication will come from an Eliassen.com address, never Gmail, Yahoo, etc.