Site Reliability Engineer

MongoDB

6 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Remote

Tech stack

Amazon Web Services (AWS)

Azure

Computer Security

Distributed Systems

DNS

Fault Tolerance

MongoDB

Network Protocols

Data Logging

Scripting (Bash/Python/Go/Ruby)

Kubernetes

Information Technology

Splunk

Programming Languages

Job description

The SRE Observability team is part of the larger Platform Engineering organization, and is dedicated to building and maintaining the observability stack (metrics, logging, tracing) used by all engineering teams to ensure the smooth functioning of their service. We also own related services, including our telemetry pipeline, and our monitoring and alerting infrastructure. Our stack includes VictoriaMetrics, Splunk, QuickWit, Jaeger, Fluentbit, and Vector. In addition to owning our observability infrastructure, as an Engineer on the team, you'll also work closely with other SWE and SRE teams to promote and implement best practices in instrumenting and monitoring their services. This is a highly collaborative role, and you will get to own some of the most relied upon internal infrastructure at Mongo.

This role will be based remotely in Spain. Responsibilities Define standards and vision for the mission-critical observability platform leveraged by all parts of the engineering organization Design, architect, build and deliver core pieces of our observability services in collaboration with other vested parties Design, implement, and troubleshoot the monitoring of services that seamlessly spans the globe - including several cloud providers Build for reliability, making services and infrastructure available, resilient, fault tolerant and self-healing Identify and configure key metrics to detect incidents and quantify service health, availability and performance. Participate in a week-long on-call rotation and blameless post-mortem process Improve our observability capabilities, optimizing for cost, ease of use, and maintainability

Requirements

Experience running mission critical services at scale Experience with observability of large scale distributed systems An understanding of information security issues Firm grasp of at least one modern programming language, beyond basic scripting Solid understanding of web and network protocols and standards (HTTP, TLS, DNS, etc) Bachelor's degree in Computer Science or equivalent experience Nice to haves

Experience with at least one of the major cloud providers (Amazon Web Services, Google Compute, Microsoft Azure) Experience working in a kubernetes-based environment kubernetes clusters What's in it for you

Benefits & conditions

Generous compensation package Opportunities to learn on the job (time to up skill in new technologies) High level of independence in your day to day work To drive the personal growth and business impact of our employees, we're committed to developing a supportive and enriching culture for everyone. From employee affinity groups, to fertility assistance and a generous parental leave policy, we value our employees' wellbeing and want to support them along every step of their professional and personal journeys. Learn more about what it's like to work at MongoDB, and help us make an impact on the world! MongoDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request an accommodation due to a disability, please inform your recruiter. MongoDB is an equal opportunities employer.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

Apply for this position

Good distractions

Moments

Videos View all