DevOps Manager
Role details
Job location
Tech stack
Job description
We are looking for an experienced and passionate DevOps & SRE Manager to lead multiple DevOps and Site Reliability Engineering teams. The ideal candidate will be responsible for building and maintaining scalable, reliable, and high-performing infrastructure and operational processes. As a DevOps & SRE Manager, you will play a key role in ensuring our development, deployment, and operational practices align with industry standards while fostering a culture of automation and continuous improvement., Leadership & Team Management:
- Lead, mentor, and develop a team of DevOps engineers and SREs to drive innovation and operational excellence.
- Build a collaborative and inclusive team culture focused on delivering high-quality services.
- Establish and track goals for your team to align with business objectives.
Infrastructure Automation & Scalability:
- Design, implement, and manage highly available and scalable cloud infrastructure in AWS and Azure.
- Oversee the implementation of Infrastructure as Code (IaC) tools (e.g., Terraform, Bicep, Ansible etc) to automate provisioning and configuration.
- Identify and address bottlenecks in deployment pipelines and infrastructure performance.
Site Reliability Engineering:
- Lead efforts to define and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets.
- Drive incident management processes to quickly detect, mitigate, and resolve issues while ensuring post-mortem analyses for continuous improvement.
- Optimize and enhance monitoring, logging, and alerting systems (e.g., NewRelic, Datadog, Splunk, Prometheus, Grafana, ELK stack).
Continuous Integration and Continuous Deployment (CI/CD):
- Establish and refine CI/CD pipelines to ensure smooth software releases with minimal/zero downtime.
- Collaborate with development teams to implement DevOps best practices and ensure code quality, security, and performance.
Security & Compliance:
- Implement and oversee security best practices in DevOps and operational workflows, including secrets management, vulnerability scans, and automated patching.
- Ensure compliance with relevant regulations and standards (e.g., SOC2, ISO 27001).
Collaboration & Communication:
- Work cross-functionally with product, engineering, and operations teams to ensure alignment on goals and priorities.
- Provide regular updates to stakeholders on system health, incidents, and improvement initiatives.
Cost Optimization:
- Analyze cloud and infrastructure costs, identify opportunities for savings, and implement cost optimization strategies.
- Manage budgets and vendor relationships for tools and services used by the team.
Requirements
Do you have experience in Terraform?, Do you have a Master's degree?, * Bachelor's degree in Computer Science, Engineering, or a related field. A Master's degree is a plus., * Proven experience managing DevOps or SRE teams in fast-paced environments.
- Hands-on expertise in cloud platforms (AWS, Azure) and containerization technologies (Docker, Kubernetes).
- Deep understanding of software development lifecycle (SDLC) and Agile practices.
- Track record of driving operational efficiency, incident resolution, and automation.
Technical Skills:
- Expertise in CI/CD tools (e.g., Jenkins, CircleCI, Github Actions, Azure DevOps).
- Experience operating in Kubernetes platforms like AKS, EKS, or similar.
- Experience using managed languages such as Python, Go, C#, Java, or similar.
- Experience designing tooling to simplify the operational management of SaaS/PaaS systems.
- Experience with monitoring and observability tools (e.g., Prometheus, Splunk, New Relic, Datadog, ELK Stack).
- Strong knowledge of infrastructure-as-code tools (e.g., Terraform, Bicep, CloudFormation).
- Strong understanding of cloud best practices for networking, security and identity management in AWS and Azure.
Soft Skills:
- Excellent leadership and people management abilities.
- Strong problem-solving skills and attention to detail.
- Exceptional communication skills to collaborate across teams and with stakeholders.
- Proven ability to manage and prioritize multiple product lines and initiatives simultaneously.