Sr Devops Engineer
Role details
Job location
Tech stack
Job description
The Lead Engineer, DevOps plays an integral role in implementing and executing cloud practices for build management, product release and operation processes. The role is responsible for managing and automating the build and deployment process and regression testing, building tools and monitoring used in product implementations. The Lead Engineer, DevOps will help in defining, maintaining procedures and tools that are used to deliver releases in a repeatable and scalable manner. This role involves significant collaboration with stakeholders across IT and partners.
Roles & Responsibilities
Design, implement, and maintain automated deployment and configuration management systems. Develop and maintain Infrastructure as Code (IaC) scripts for provisioning infrastructure. Continuously improve deployment processes to enhance efficiency, reliability, and scalability. Implement and manage CI/CD pipelines to automate the software delivery lifecycle. Ensure integration of automated testing into all pipeline stages. Maintain and optimize build agents/runners to support development workflows. Implement and maintain monitoring, alerting, and logging systems. Collaborate with teams to define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs). Contribute to system reliability through proactive monitoring and alerting. Conduct blameless post-incident reviews. Contribute to defining and managing error budgets. Develop and maintain runbooks and operational procedures. Collaborate with development teams to design systems for high availability, scalability, and performance. Perform capacity planning and system performance analysis. Identify and address system bottlenecks. Collaborate with security teams to implement best practices. Ensure systems comply with industry standards and security requirements. Create and maintain documentation for infrastructure, deployment processes, and workflows. Contribute to internal knowledge bases and operational documentation.
Requirements
Do you have experience in Tooling?, 4+ years of experience in DevOps, Site Reliability Engineering (SRE), or Platform Engineering roles. Strong experience with cloud platforms (Azure or AWS). Experience with Linux environments.
Hands-on experience with: Infrastructure as Code (Terraform or equivalent) CI/CD tools and pipeline development Experience designing and supporting highly available, fault-tolerant systems. Proficiency in Python, Bash, or similar scripting languages.
Working knowledge of: System architecture patterns Observability tools Cloud security best practices Experience with containerization (Docker) and orchestration (Kubernetes or similar). Demonstrated ability to learn and adopt new tools and technologies.
Key Performance Indicators (KPIs)
CI/CD pipeline success rate and execution time Deployment frequency and failure rate System uptime and reliability metrics (SLO adherence) Mean time to resolution (MTTR) Build agent availability and performance Automation coverage across deployment workflows, Reliable, scalable, and efficient deployment pipelines Reduced manual intervention in release processes High system availability and performance Strong alignment between engineering and operations Well-documented and repeatable operational processes, 5+ years of experience in DevOps, Site Reliability Engineering (SRE), or Platform Engineering roles. Strong experience with cloud platforms (Azure or AWS). Experience with Linux environments.
Hands-on experience with: Infrastructure as Code (Terraform or equivalent) CI/CD tools and pipeline development Experience designing and supporting highly available, fault-tolerant systems. Proficiency in Python, Bash, or similar scripting languages.
Working knowledge of: System architecture patterns Observability tools Cloud security best practices Experience with containerization (Docker) and orchestration (Kubernetes or similar). Demonstrated ability to learn and adopt new tools and technologies.
Key Performance Indicators (KPIs)
CI/CD pipeline success rate and execution time Deployment frequency and failure rate System uptime and reliability metrics (SLO adherence) Mean time to resolution (MTTR) Build agent availability and performance Automation coverage across deployment workflows, Reliable, scalable, and efficient deployment pipelines Reduced manual intervention in release processes High system availability and performance Strong alignment between engineering and operations Well-documented and repeatable operational processes ABOUT THE COMPANY