Senior Cloud Systems Engineer

Lunar Outpost
Golden, United States of America
10 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 145K

Job location

Golden, United States of America

Tech stack

Amazon Web Services (AWS)
DevOps
DNS
Github
Identity and Access Management
Key Management
Uptime
Site Reliability Engineering Practices
Cloud Services
Load Balancing
Cloud Platform System
Autoscaling
Kubernetes Helm Charts
Amazon Web Services (AWS)
Gitlab-ci
Kubernetes
Infrastructure Automation Frameworks
Deployment Automation
Performance Monitor
Terraform
Jenkins

Job description

  • Own and manage Stargate production releases and deployment pipelines using GitOps practices

  • Drive operational excellence initiatives including metrics collection, log aggregation, uptime monitoring, KPI tracking, and SIM Integration

  • Maintain and achieve 99.99% (four nines) to 99.999% (five nines) uptime SLAs

  • Design, develop, and maintain Helm charts for Stargate and related infrastructure components

  • Implement and manage progressive deployment strategies including canary deployments and blue-green deployments

  • Oversee critical Kubernetes infrastructure including volume management, DNS configuration, load balancer provisioning, and secret monitoring/management

  • Manage and optimize Kubernetes deployments and related AWS services

  • Implement and maintain observability stack using OpenTelemetry for comprehensive monitoring and alerting

  • Collaborate with engineering teams to establish and enforce operational best practices and reliability standards

Requirements

Do you have experience in SRE?, * 5+ years of production DevOps/SRE experience with demonstrable track record of maintaining high-availability systems

  • Kubernetes administration experience with elevated cluster access in production environments

  • Strong proficiency writing and maintaining Helm charts for complex, multi-component applications

  • Hands-on experience implementing canary deployments, blue-green deployments, and other progressive delivery patterns

  • Deep knowledge of Kubernetes infrastructure management: persistent volumes, DNS/networking, load balancers, and secrets management

  • Production experience with GitOps workflows and Flux CD

  • Proven track record maintaining 99.99%+ uptime in production environments

  • Excellent judgment and decision-making skills when working with production systems

Preferred Qualifications:

  • Experience with AWS cloud services, particularly EKS (Elastic Kubernetes Service), Secrets Manager, VPC networking, IAM, and AWS Load Balancers

  • Experience with Karpenter for Kubernetes node autoscaling and cluster optimization

  • Experience with OpenTelemetry instrumentation and observability platforms

  • Kubernetes certifications (CKA, CKAD, or CKS)

  • Experience building and maintaining CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, etc.)

  • Knowledge of infrastructure-as-code tools (Terraform, CDK)

  • Experience implementing SRE practices including SLIs, SLOs, and error budgets

Benefits & conditions

Pulled from the full job description

  • 401(k) 4% Match
  • Tuition reimbursement
  • Parental leave
  • Health insurance
  • 401(k) matching
  • Paid time off
  • Vision insurance, Compensation & Benefits: Compensation level and base salary are competitively structured and thoughtfully determined based on factors such as relevant skills, experience, education, and the scope of the role.
  • Comprehensive health coverage: Medical, dental, and vision benefits, with 70% of premiums covered by the employer
  • Paid time off: Three (3) weeks per year of vacation
  • Retirement plan: Up to 4% employer match on 401(k) contributions
  • Paid holidays: 11 company-recognized holidays
  • Parental leave
  • Educational reimbursement opportunities to support company objectives, continued learning, and career development

About the company

Are you passionate about shaping the future of humanity's presence in space? Lunar Outpost, an industry leader in space robotics and planetary vehicles, invites you to join our team! Lunar Outpost is dedicated to creating a permanent presence in space, while also driving positive impacts here on Earth. Currently, we are seeking a Cloud Systems Engineer to contribute to our mission in a dynamic startup environment. The main responsibilities of this role include managing Stargate deployments in production, ensuring high availability and uptime, executing reliable releases, and driving operational excellence through comprehensive monitoring, metrics, and infrastructure management. Stargate is a next-generation Command and Control (C2) platform-the ground software that enables and empowers all Lunar Outpost missions, including the Lunar Terrain Vehicle (LTV) program. As mission-agnostic software used by all operators in mission control, Stargate's reliability and uptime are critical to mission success.

Apply for this position