Site Reliability / DevOps Engineer

eClerx LLC
Raleigh, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Junior
Compensation
$ 138K

Job location

Raleigh, United States of America

Tech stack

Amazon Web Services (AWS)
Application Performance Management
Automation of Tests
Azure
Bash
Cloud Engineering
Continuous Integration
Linux
DevOps
Distributed Systems
Elasticsearch
Github
Monitoring of Systems
Python
Enterprise Messaging Systems
Powershell
Reliability Engineering
Data Streaming
Datadog
Scripting (Bash/Python/Go/Ruby)
Enterprise Software Applications
Cloud Platform System
Delivery Pipeline
Grafana
Mttr
Cloudformation
Gitlab-ci
Kubernetes
Bicep
Kafka
Terraform
Docker
Jenkins

Job description

eClerx is seeking a motivated SRE/DevOps Engineer with strong observability experience to join our growing Platform Engineering team. This team is responsible for managing cloud infrastructure, advancing DevOps practices, improving platform reliability, and supporting highly available enterprise applications.

The ideal candidate will have a deep understanding of cloud-native architectures, distributed systems, CI/CD automation, observability frameworks, and site reliability engineering principles. This individual will play a key role in improving platform resilience, operational efficiency, and system performance across a modern cloud-based technology ecosystem.

Responsibilities

  • Design, implement, and enhance system observability and monitoring solutions.
  • Monitor system performance, create incident response plans, and implement observability practices to gain deeper insights into system behavior.
  • Define, implement, and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
  • Improve platform reliability, scalability, and resiliency.
  • Conduct post-incident reviews and implement corrective actions to prevent recurring issues.
  • Partner with engineering teams to implement observability tooling and leverage telemetry data to troubleshoot and resolve incidents.
  • Utilize observability and event management capabilities to improve key operational metrics, including Mean Time to Detect (MTTD) and Mean Time to Restore (MTTR).
  • Continuously optimize infrastructure, architecture, automation, CI/CD processes, and operational workflows.
  • Collaborate closely with software engineers to ensure applications are designed and deployed following DevOps and reliability best practices.
  • Participate in a rotating on-call schedule, including support for production releases and critical incidents outside normal business hours when required.

Requirements

Do you have experience in Tooling?, * 5+ years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role.

  • 5+ years of work experience with Public Cloud (Azure (preferred)or AWS)
  • 3+ years of hands-on experience with observability platforms such as Datadog, Elasticsearch, Grafana, or similar solutions.
  • 5+ years of experience with scripting languages like Python, Bash, Powershell, etc.
  • 3+ years of experience with containerization and orchestration technologies, including Docker and Kubernetes.
  • 2+ years of experience developing and managing CI/CD pipelines using tools such as Azure DevOps, GitLab CI/CD, GitHub Actions, Jenkins, or similar.
  • 2+ years of experience with Infrastructure-as-Code (IaC) tools such as Terraform, Azure Bicep, AWS CloudFormation, or equivalent technologies.
  • 1+ years of experience using site reliability and resilience testing tools such as Gremlin, Chaos Mesh, or similar platforms.
  • Proven experience leveraging observability best practices, end-user monitoring, application performance monitoring, and infrastructure monitoring solutions.
  • Experience with event streaming and messaging platforms such as Kafka or Azure Event Hubs.
  • Strong understanding of Linux operating systems and administration.
  • Preferred Qualifications
  • Kubernetes certification
  • Cloud platform certifications (Azure, AWS, or GCP).
  • Experience working in Azure environments and/or Azure DevOps.
  • Experience implementing and managing Datadog or other modern observability platforms.
  • Experience supporting enterprise-scale applications within financial services, capital markets, fintech, or other highly regulated industries.

In the US, the target base salary for this role is $120,000-$137,500. Compensation is based on a range of factors that include relevant experience, knowledge, skills, other job-related qualifications, and geography. We expect the majority of candidates who are offered roles at our company to fall throughout the range based on these factors

Benefits & conditions

3.63.6 out of 5 stars Raleigh, NC $120,000 - $137,500 a year - Full-time

About the company

eClerx is a leading provider of productized services, bringing together people, technology and domain expertise to amplify business results. The firm provides business process management, automation, and analytics services to a number of Fortune 2000 enterprises, including some of the world's leading financial services, communications, retail, fashion, media & entertainment, manufacturing, travel & leisure, and technology companies. Incorporated in 2000, eClerx is traded on both the Bombay and National Stock Exchanges of India. The firm employs more than 19,000 people across Australia, Canada, France, Germany, Switzerland, Egypt. India, Italy, Netherlands, Peru, Philippines, Singapore, Thailand, the UK, and the USA.

Apply for this position