Site Reliability Engineer III
Role details
Job location
Tech stack
Job description
We are seeking a highly skilled Site Reliability Engineer III to join a dynamic team dedicated to enhancing system reliability, resilience, and automation. In this role, you will leverage your expertise in software and systems engineering to support large-scale, distributed, fault-tolerant systems, ensuring critical platforms are consistently available and performant. You'll play a key role in cloud transformation initiatives, utilizing monitoring tools to proactively identify issues, optimize performance, and implement continuous improvements. This is an exceptional opportunity for experienced engineers passionate about driving stability, scalability, and innovation in a fast-paced environment.
Requirements
- 5+ years of related experience in site reliability engineering, systems engineering, or DevOps roles
- Strong understanding of Kubernetes, containers, microservices architecture, and cloud platforms, especially Google Cloud Platform (GCP)
- Hands-on experience with infrastructure automation tools like Terraform, monitoring tools such as Dynatrace, Prometheus, and Grafana, as well as CI/CD pipelines using Azure DevOps (ADO)
- Proficiency in Linux and Windows systems, network troubleshooting, and security best practices
- Demonstrated ability to implement fault-tolerance, performance tuning, capacity planning, and incident response in large-scale enterprise environments