Site Reliability Engineer
Role details
Job location
Tech stack
Job description
As a Site Reliability Engineer on the team, you will be responsible for helping to build and run these mission critical systems. Through the implementation of monitoring and automation, you will constantly ensure the health, reliability, scalability, and performance of the platforms.
The Site Reliability team interacts with engineering teams including ingest/data processing, mapping, labeling, triage, machine learning (detection, prediction, tracking), motion planning/control, offline simulation, and release/deployment teams to provide uniform service observability and incident response.
What you'll do:
- Build monitoring to ensure our platform is healthy and its reliability measurable
- Build alerting and a set of runbooks to enable faster detection and remediation of platform issues
- Debug complex issues that may combine multiple components of the stack and ensure proper fixes are implemented to prevent these issues from happening again
- Participate in an on-call rotation and culture of continuous improvement through blameless postmortems
- Design and implement components of the platform to enable features that make the work of our customers possible, simpler and more efficient
- Build Kubernetes controllers to automate operations
Requirements
- Bachelor's degree in Computer Engineering, Computer Science, Electrical Engineering, Robotics or a related field and 4+ years of relevant experience (or Master's degree and 2+ years of relevant experience, or PhD)
- Fundamental understanding of Linux operating system internals, TCP/IP networking, and storage subsystems
- Hands on development in Go or Python to create robust software that can run reliably in production
- Strong experience scaling and securing services in the cloud (AWS, Google Cloud Platform) or cloud native environments
- Experience using infrastructure-as-code principles to automate the creation of infrastructure resources (e.g. Terraform, CloudFormation)
- Experience authoring and maintaining Kubernetes Controllers in Go
- Experience running Kubernetes and related core components in a large-scale, production environment
- Experience with metrics (e.g. Prometheus), logging (e.g. Elasticsearch, Loki) and tracing (e.g. Jaeger, Tempo) systems
- Understanding of engineering design limitations and ability to provide guidance to teams to scale their services to achieve desired performance within budget
- A focus on increasing service reliability through defining and adhering to SLOs
- Strong communication skills and the ability to work effectively in a diverse and distributed team
Benefits & conditions
- Competitive compensation packages
- High-quality individual and family medical, dental, and vision insurance
- Health savings account with available employer match
- Employer-matched 401(k) retirement plan with immediate vesting
- Employer-paid group term life insurance and the option to elect voluntary life insurance
- Paid parental leave
- Paid medical leave
- Unlimited vacation
- 15 paid holidays
- Daily lunches, snacks, and beverages available in all office locations
- Pre-tax spending accounts for healthcare and dependent care expenses
- Pre-tax commuter benefits
- Monthly wellness stipend
- Adoption/Surrogacy support program
- Backup child and elder care program
- Professional development reimbursement
- Employee assistance program
- Discounted programs that include legal services, identity theft protection, pet insurance, and more
- Company and team bonding outlets: employee resource groups, quarterly team activity stipend, and wellness initiatives
Learn more about Latitude's team, mission and career opportunities at lat.ai!
The expected base salary range for this full-time position in California is $179,200 - $268,800 USD. Actual starting pay will be based on job-related factors, including exact work location, experience, relevant training and education, and skill level. Latitude employees are also eligible to participate in Latitude's annual bonus programs, equity compensation, and generous Company benefits program, subject to eligibility requirements.