Site Reliability Engineer (Auth0)
Role details
Job location
Tech stack
Job description
Auth0 provides an unparalleled authentication experience for hundreds of millions of users worldwide. Our commitment to reliability is a key foundation of our product and our dedication to exceeding customer availability expectations is a core engineering focus. As a mid-level Site Reliability Engineer, you'll join our SRE team based in Europe to ensure production systems are not only operational but also resilient, scalable, and ready for exponential growth. This isn't just about keeping the lights on; it's about directly contributing to the platform's core resiliency and robustness. You'll be a hands-on builder, crafting solutions that make our system more reliable by design. What You'll Do
- Design and build custom software in Go to enhance the platform's reliability, resiliency, and redundancy.
- Partner with engineering teams to embed reliability principles, improving the availability, performance, and observability of our services.
- Use your deep understanding of infrastructure and observability principles to identify opportunities for improvement within the product and implement solutions.
- Contribute to our on-call rotation, providing rapid, effective response to critical incidents and using your expertise to troubleshoot, mitigate or accurately elevate production issues.
- Develop and refine our SRE tooling and processes, focusing on automation and operational efficiency.
- Define, document, and champion reliability best practices across the organisation.
What You'll Need To Be Successful
- A proactive and systematic approach to problem-solving, with a high degree of ownership.
Requirements
- Proven experience in a production environment supporting large-scale, mission-critical applications with a high degree of autonomy.
- Proficiency in at least one programming language, with a strong preference for Go. You should be comfortable writing custom applications, not just scripts.
- Experience with infrastructure as code (Terraform), container orchestration (Kubernetes, Docker) and GitOps (ArgoCD).
- Demonstrable expertise in a major cloud provider (Azure, AWS, or GCP).
- A strong grasp of microservices architecture, databases (SQL, NoSQL), and networking fundamentals, so you can understand how custom code can solve platform-level issues.
- An understanding of core SRE principles, including SLIs, SLOs, and error budgets.
- Experience in an on-call rotation for a 24/7 cloud-based environment.
- Exceptional communication and collaboration skills, with a proven ability to work effectively in a remote, distributed team, where tasks may be self-driven.
Benefits & conditions
What you can look forward to as a Full-Time Okta employee
- Amazing Benefits
- Making Social Impact
- Developing Talent and Fostering Connection + Community at Okta
Okta cultivates a dynamic work environment, providing the best tools, technology and benefits to empower our employees to work productively in a setting that best and uniquely suits their needs. Each organization is unique in the degree of flexibility and mobility in which they work so that all employees are enabled to be their most creative and successful versions of themselves, regardless of where they live. Find your place at Okta today! https://www.okta.com/company/careers/.