Site Reliability Engineer
Role details
Job location
Tech stack
Job description
As a Site Reliability Engineer, you will work with Agile engineering teams to provide production insight into running and operating software at-scale in a globally distributed and highly available cloud based system.
You will guide the team to consider resiliency, scalability and operability implications in the choices they make during the development cycle to help foster an ownership in production mentality ("You build it, you run it").
You will also own and develop our platform by implementing and championing GitOps principles. You will play a hands-on role helping the team meet technical, operational, schedule, and business requirements.
The ideal candidate will be a systems problem solver with a passion for crafting products that deliver incredible customer experiences, have deep experience with infrastructure, operational automation, data driven metrics collection, modern platform management, and a true desire to automate it rather than do it repeatedly., * Design, build, and maintain highly scalable and reliable infrastructure using Infrastructure-as-Code (IaC) principles and Terraform.
- Manage and optimize Kubernetes clusters, leveraging Helm for efficient application deployment and lifecycle management.
- Administer and troubleshoot Linux-based systems, ensuring their performance, security, and availability.
- Work extensively with GCP and AWS services, architecting and managing cloud-native solutions.
- Implement robust observability practices, including monitoring, logging and alerting to proactively identify and resolve issues.
- Develop and maintain automation tools using languages such as Python and Go to streamline operations and improve efficiency.
- Provide production support, responding to incidents and outages, and participating in a 24/7 on-call rotation.
- Troubleshoot complex networking issues and optimize network performance.
- Engage in communications across all areas of the organization.
Requirements
- Strong experience with IaC (Terraform preferred) and configuration management tools.
- Deep understanding of Kubernetes and Helm.
- Extensive experience with Linux administration and troubleshooting.
- Hands-on experience with GCP and AWS services and cloud-native architectures.
- Experience in implementing observability solutions.
- Solid understanding of networking concepts and protocols.
- Ability to work independently and collaboratively in a fast-paced environment.
- Ability to lead and work on projects.
- Ability to multitask and adapt quickly to changing priorities.
- Excellent communication and problem-solving skills.
- Willingness to participate in a 24/7 on-call rotation.
Benefits & conditions
- Competitive salary
- Bonus Plan
- Benefits and Perks vary based on location.
Benefits and Perks:
- Regional specific competitive benefits
- Build your own Benefits (BYOB) perk
- Many other fun and exciting benefits and activities!