Site Reliability Engineer, GNC

Space Exploration Technologies Corp.
Hawthorne, United States of America
27 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate
Compensation
$ 145K

Job location

Hawthorne, United States of America

Tech stack

Algorithm Design
Business Analytics Applications
Data analysis
Big Data
Information Systems
Databases
Continuous Integration
Linux
DevOps
Microprocessors
Gradle
Python
Monte Carlo Methods
Package Management Systems
Reliability Engineering
Ansible
Simulation Software
Software Construction
Software Engineering
TCP/IP
Vagrant
Virtualization Technology
Web Applications
Computer Networking Systems
Kubernetes
Infrastructure Automation Frameworks
Information Technology
Deployment Automation
Data Analytics
Build Tools
Hardware Infrastructure
Puppet
Terraform
Docker

Job description

GNC teams at SpaceX are responsible for vehicle design, trajectory design and optimization, high-fidelity vehicle simulation, software and control algorithm development, while also supporting both launch and on-orbit operations across multiple vehicle programs. In this role, you will work closely with GNC teams across SpaceX to maintain and improve a suite of critical GNC-focused tools and infrastructure that must scale reliably to enable a multiplanetary future. These systems include on-prem services, large-scale Monte Carlo simulations on our high-performance computing (HPC) cluster, automated data analysis pipelines, continuous integration systems for rocket and simulation software, GNC analysis infrastructure, and vehicle configuration verification tools., * Deploy, upgrade, operate, and scale a suite of mission-critical GNC products and services

  • Provision and maintain virtual and physical servers
  • Work with SpaceX HPC team to monitor and maintain an HPC cluster consisting of tens of thousands of CPUs.
  • Closely collaborate with GNC software engineers to create highly operable and maintainable products
  • Monitoring and incident response for web applications and services
  • Manage the underlying computational infrastructure of GNC in collaboration with IT stakeholders
  • Engage in and improve the whole lifecycle of services from whiteboard to operational
  • Make data-driven recommendations for future hardware purchases
  • Practice sustainable incident response and postmortems
  • Provide end-user support to GNC engineering for products by becoming an expert on analysis applications and support users in troubleshooting and pointing to features
  • Configure automated deployment pipelines for web apps
  • Develop or improve GNC web apps and tools for better usability, maintainability, and robustness
  • Demo and document new software changes such as operating system upgrades, shared filesystem changes, or major tool rollouts
  • Focus on performance bottlenecks and performance improvement techniques

Requirements

The ideal candidate is flexible, possesses broad skills spanning product operations and software development, and thrives in a fast-paced, high-impact environment., * Bachelor's degree in computer science, information systems/IT, engineering, math, or scientific discipline and 2+ years of software development experience OR 4+ years of professional experience building software with site reliability or DevOps in lieu of a degree

  • Experience with Linux operating systems
  • Experience with Python and Python based development frameworks

PREFERRED SKILLS AND EXPERIENCE:

  • 2+ years of systems administration, site reliability engineering, or DevOps experience
  • 2+ years of experience with Python and Python-based development frameworks
  • 2+ years of Linux experience
  • Expertise with Docker, Vagrant, and Kubernetes or similar technologies
  • Extensive Experience with configuration management tools such as Ansible, Puppet, Terraform
  • Experience with build systems (Make, Bazel / Pants / Buck, Gradle) and package management tools (pip, npm)
  • Strong understanding of virtualization and hypervisor technologies
  • Understanding of databases and data modeling
  • Experience with automatically managing dozens or hundreds of servers
  • Strong networking knowledge of TCP/IP
  • Experience scaling web applications and optimizing applications for performance
  • Experience with managing on-prem infrastructure, including direct experience managing GPU fleets
  • Experience with high-performance computing systems or large-scale data analysis systems
  • Must be comfortable working with mission-critical and sensitive systems, with a sense of urgency appropriate to the responsibilities
  • Ability and willingness to obtain a Top Secret clearance, * To conform to U.S. Government export regulations, applicant must be a (i) U.S. citizen or national, (ii) U.S. lawful, permanent resident (aka green card holder), (iii) Refugee under 8 U.S.C. * 1157, or (iv) Asylee under 8 U.S.C. * 1158, or be eligible to obtain the required authorizations from the U.S. Department of State. Learn more about the ITAR here.

Benefits & conditions

$125,000.00 - $145,000.00 / yr life insurance, parental leave, paid holidays, sick time, 401(k), retirement plan, stock options United States, California, Hawthorne Apr 29, 2026, * An active clearance mayprovide the opportunity for youto work on sensitive SpaceX missions; if so, you will be subject to pre-employment drug and random drug and alcohol testing

  • Willing to work extended hours and weekends when needed to meet critical deadlines

COMPENSATION AND BENEFITS:

Pay Range: Site Reliability Engineer/Level I: $125,000.00 - $145,000.00/per year Site Reliability Engineer/Level II: $145,000.00 - $175,000.00/per year

Your actual level and base salary will be determined on a case-by-case basis and may vary based on the following considerations: job-related knowledge and skills, education, and experience.

Base salary is just one part of your total rewards package at SpaceX. You may also be eligible for long-term incentives, in the form of company stock, stock options, or long-term cash awards, as well as potential discretionary bonuses and the ability to purchase additional stock at a discount through an Employee Stock Purchase Plan. You will also receive access to comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short and long-term disability insurance, life insurance, paid parental leave, and various other discounts and perks. You may also accrue 3 weeks of paid vacation and will be eligible for 10 or more paid holidays per year. Employees accrue paid sick leave pursuant to Company policy which satisfies or exceeds the accrual, carryover, and use requirements of the law.

About the company

SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to make this possible, with the ultimate goal ofenabling human life on Mars.

Apply for this position