DevOps Manager
Role details
Job location
Tech stack
Job description
We are seeking an experienced and strategic Senior Manager - DevOps & SRE to lead and evolve our reliability and platform engineering capabilities across our global eCommerce ecosystem.
This role goes beyond traditional DevOps management. You will be responsible for defining and driving our reliability strategy, embedding SRE principles (SLIs, SLOs, error budgets), and ensuring our platforms operate at scale with high availability, performance, and resilience.
You will lead a distributed team of DevOps and SRE engineers, working closely with Engineering, Product, Security, and Architecture to enable reliable, scalable, and automated cloud-native systems across Azure, AWS, and GCP., Leadership & Organizational Impact Lead, mentor, and grow a high-performing DevOps & SRE function. Define clear ownership models, reliability standards, and ways of working. Elevate engineering maturity through automation, observability, and operational excellence. Drive accountability and promote a culture of reliability, learning, and continuous improvement. Partner with senior stakeholders to align platform reliability with business objectives. Reliability Strategy & SRE Practices
Define and implement SRE best practices (SLIs, SLOs, error budgets). Own incident management strategy, postmortems, and systemic improvements. Improve resilience through proactive risk identification and mitigation. Establish measurable reliability KPIs aligned with customer experience. Cloud Infrastructure & Platform Engineering
Oversee cloud operations primarily in Microsoft Azure, with exposure to AWS and GCP. Ensure infrastructure is scalable, secure, and cost-efficient. Drive Infrastructure as Code adoption (Terraform, Bicep/ARM). Define platform standards for Kubernetes and containerized environments. Automation, CI/CD & Developer Enablement
Champion CI/CD best practices and release reliability. Improve deployment strategies (blue/green, canary releases). Reduce operational toil through automation and self-healing systems. Support high-traffic eCommerce events and critical production workloads. Observability & Operational Excellence
Define and evolve our observability strategy (Azure Monitor, Grafana, Datadog, Prometheus, etc.). Improve signal-to-noise ratio in monitoring and alerting. Drive root cause analysis discipline and continuous improvement loops. Explore AI-assisted operations for incident detection, alert optimization, and operational efficiency. Security & Compliance
Ensure secure cloud practices (IAM, least privilege, data protection). Partner with Security to enforce compliance and governance standards. Embed security and reliability into the full SDLC lifecycle. Key Experience & Skills
Requirements
10+ years of experience in DevOps, SRE, or Platform Engineering roles. 3+ years leading and scaling technical teams. Strong hands-on background in Microsoft Azure (required) AWS (required) and GCP (nice to have). Deep understanding of cloud-native architectures and Kubernetes. Proven experience implementing SRE frameworks (SLAs, SLOs, incident management). Strong experience with Infrastructure as Code (Terraform, ARM/Bicep). Observability expertise (Grafana, Datadog, Prometheus, Azure Monitor). Experience managing production systems at scale (high-traffic environments preferred). Strong stakeholder management and communication skills. Strategic mindset with the ability to balance technical depth and business impact. Nice to Have
Experience in global eCommerce platforms. Experience leading cloud transformation initiatives. Exposure to AI-driven operational tooling. Relevant certifications (Azure, Kubernetes, Cloud Architecture).
Benefits & conditions
Competitive salary and benefits: Your financial well-being is important to us. Join ESW and experience the satisfaction of being rewarded for your hard work, dedication, and commitment.