Site Reliability Engineer

MongoDB
6 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Remote

Tech stack

Amazon Web Services (AWS)
Azure
Computer Security
Distributed Systems
DNS
Fault Tolerance
MongoDB
Network Protocols
Data Logging
Scripting (Bash/Python/Go/Ruby)
Kubernetes
Information Technology
Splunk
Programming Languages

Job description

The SRE Observability team is part of the larger Platform Engineering organization, and is dedicated to building and maintaining the observability stack (metrics, logging, tracing) used by all engineering teams to ensure the smooth functioning of their service. We also own related services, including our telemetry pipeline, and our monitoring and alerting infrastructure. Our stack includes VictoriaMetrics, Splunk, QuickWit, Jaeger, Fluentbit, and Vector. In addition to owning our observability infrastructure, as an Engineer on the team, you'll also work closely with other SWE and SRE teams to promote and implement best practices in instrumenting and monitoring their services. This is a highly collaborative role, and you will get to own some of the most relied upon internal infrastructure at Mongo.

This role will be based remotely in Spain. Responsibilities Define standards and vision for the mission-critical observability platform leveraged by all parts of the engineering organization Design, architect, build and deliver core pieces of our observability services in collaboration with other vested parties Design, implement, and troubleshoot the monitoring of services that seamlessly spans the globe - including several cloud providers Build for reliability, making services and infrastructure available, resilient, fault tolerant and self-healing Identify and configure key metrics to detect incidents and quantify service health, availability and performance. Participate in a week-long on-call rotation and blameless post-mortem process Improve our observability capabilities, optimizing for cost, ease of use, and maintainability

Requirements

Experience running mission critical services at scale Experience with observability of large scale distributed systems An understanding of information security issues Firm grasp of at least one modern programming language, beyond basic scripting Solid understanding of web and network protocols and standards (HTTP, TLS, DNS, etc) Bachelor's degree in Computer Science or equivalent experience Nice to haves

Experience with at least one of the major cloud providers (Amazon Web Services, Google Compute, Microsoft Azure) Experience working in a kubernetes-based environment kubernetes clusters What's in it for you

Benefits & conditions

Generous compensation package Opportunities to learn on the job (time to up skill in new technologies) High level of independence in your day to day work To drive the personal growth and business impact of our employees, we're committed to developing a supportive and enriching culture for everyone. From employee affinity groups, to fertility assistance and a generous parental leave policy, we value our employees' wellbeing and want to support them along every step of their professional and personal journeys. Learn more about what it's like to work at MongoDB, and help us make an impact on the world! MongoDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request an accommodation due to a disability, please inform your recruiter. MongoDB is an equal opportunities employer.

Apply for this position