Site Reliability Engineer
Role details
Job location
Tech stack
Job description
The Site Reliability Engineering team (SRE) is responsible to provide best in class Observability, Alerting and Incident management tools and processes to service teams. As an enabling team, we help BlaBlacar engineers to efficiently improve their service reliability. Empowering developers and bringing them our reliability expertise are at the core of our daily work., * Core Infrastructure: Kubernetes, Google Cloud Platform
- GitOps/Delivery: GitHub, Terraform, Flux, Helm, Jenkins
- Observability/Incident Management: Datadog, Opentelemetry, Grafana IRM,
- In house Synthetic Tests platform: Playwright, Qualcium, SauceLabs
- Languages: Go / Python for Tooling, Typescripts/JS for the testing platform, * Support software engineers by creating, maintaining, and improving observability and alerting tools and frameworks. You embrace the use of AI, leveraging agentic to eliminate toil and streamline your daily tasks
- Own the Service Level Objectives (SLOs) framework, assist in the design and maintenance of indicators (SLI) and objectives to ensure service reliability.
- Owning the incident management process by defining best practices, standards, and ensuring continuous improvement through post-mortems and chaos engineering. While developers handle incidents within their scope, you could step in as Incident Commander during high-severity incidents, leading coordination efforts .
- Develop and maintain tools, such as Terraform modules or Go apps, to help automate and enhance reliability across services.
- Build and promote reporting on operational metrics and incidents to drive distributed and continuous improvement., * Hybrid status for this role : 2-3 days at the Office
- 4 additional weeks on top of legal maternity/paternity leaves
- 50% healthcare coverage (Alan)
- Financial support for home office equipment
- Minimum 25 days holiday per year
- Local meal plan policy (Swile card)
- 50% transportation paid (Forfait Mobilité Durable)
- Free unlimited carpooling & bus rides
- Personal growth via trainings, mentorship, and internal mobility opportunities
- Employee Stock ownership plan
- Regular team building events
- 1 day off per year to test our product, * a 45-min video-call with Maxime, Talent Acquisition Manager, to get to know you, understand your career expectations and answer your questions
- a 60-min video-call with Damien Bertau, Hiring Manager, to discuss your experience and share more details about the team
- a 90-min system design interview with 2 team members to discuss about your technical expertise
- a 45-min video-call with Maxime Fouilleul, Head of Foundations, to get a wider vision of the department and its strategy
Our hiring process lasts on average 25-30 days, offers usually come within 48 hours.
Please note that one of these interviews will be onsite. BlaBlaCar is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Requirements
Do you have experience in Terraform?, * 1 to 5 years of experience in SRE, DevOps, or Software Engineering roles
- Working in a multidisciplinary environment will request strong communication skills : you'll need to adapt your communication level to other teams expertise and be able to understand their needs
- Strong knowledge of observability tools (e.g., Datadog) and understanding of metrics, logging, and tracing.
- Troubleshooting/oncall experience in production environments, diagnosing and resolving technical issues effectively (experience with Kubernetes is a plus).
- Full working proficiency in English
- Fit with our BlaBlaPrinciples
- Thriving in a collaborative, fast-growing and innovative environment
- Ability to take ownership, aligned with business priorities and navigating in different contexts
- Nice to have:
- Familiarity with incident management platforms (e.g., Grafana IRM) is a bonus
- Experience working with Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
- Exposure to programming in Go or a strong interest in learning it.
- Experience in integrating Opentelemtry
- Backend services are built using multiple programming languages: while development skills aren't required, familiarity with object-oriented programming and scripting languages is an advantage.
- Familiarity with web/mobile testing tools or a strong curiosity to understand how software is tested at scale.