Infrastructure Reliability Engineer

Anduril Industries
Costa Mesa, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Compensation
$ 194K

Job location

Costa Mesa, United States of America

Tech stack

Amazon Web Services (AWS)
Azure
Bash
Cloud Computing
DevOps
Programming Tools
Monitoring of Systems
Python
Reliability Engineering
Prometheus
Software Engineering
Datadog
CircleCI
Scripting (Bash/Python/Go/Ruby)
Google Cloud Platform
Cloud Platform System
Grafana
Kubernetes
Infrastructure Automation Frameworks
Github Enterprise
Hardware Infrastructure
Terraform
Docker
Artifactory
Go

Job description

This is a small but growing team responsible for the infrastructure and operations behind core developer tools used across the entire engineering organization. You'll own the full lifecycle - patching, upgrades, backups, scaling, and incident response - for services that every engineer depends on daily. The role blends DevOps, SRE, and software engineering, and is ideal for engineers who want high ownership and company-wide impact. You should have a mindset of continuous improvement - if something is manual and repetitive, your instinct should be to automate it away. As the company's on-prem infrastructure footprint grows, this team will expand its scope to provide SRE capabilities for on-prem systems - making this an opportunity to help shape that practice from the ground up. WHAT YOU'LL DO

  • Own the lifecycle of core self-hosted developer tools (e.g., GitHub Enterprise Server, CircleCI, JFrog Artifactory/Xray)
  • Design and implement automated systems for patching, backups (with validation), and upgrades
  • Scale infrastructure to support a fast-growing engineering org
  • Use Infrastructure-as-Code (Terraform) to manage environments
  • Operate and troubleshoot systems using Docker, Kubernetes, and cloud platforms (AWS, GCP, Azure)
  • Define and maintain SLOs for service availability, reliability, and performance
  • Build and maintain monitoring, alerting, and observability for developer tool services
  • Lead and participate in incident response and root cause analysis
  • Work cross-functionally with platform, security, infrastructure (on-prem and cloud), and software teams, To ensure your safety and help you navigate your job search with confidence, please keep the following critical points in mind:
  • No Financial Requests: Anduril will never solicit payment or demand personal financial details (such as banking information, credit card numbers, or social security numbers) at any stage of our hiring process. Our legitimate recruitment is entirely free for candidates.

Requirements

  • Experience operating production systems using Docker and Kubernetes
  • Proficiency with at least one cloud platform (AWS, GCP, or Azure)
  • Experience managing infrastructure with Infrastructure-as-Code tools (e.g., Terraform)
  • Strong problem-solving skills with a focus on automation
  • Scripting or software development experience (e.g., Python, Go, Bash)
  • Familiarity with CI/CD pipelines and developer tooling
  • Ability to own systems end-to-end, from design to incident resolution
  • Eligible to obtain and maintain an active U.S. Secret security clearance

PREFERRED QUALIFICATIONS

  • Prior experience with GitHub Enterprise Server, JFrog Artifactory/Xray, or CircleCI
  • Experience maintaining highly available, scalable internal tools
  • Exposure to security best practices, compliance requirements, or auditing
  • Experience supporting large, rapidly scaling engineering organizations
  • Experience with monitoring and observability platforms (e.g., Datadog, Prometheus, Grafana)
  • Background in SRE or hybrid SWE/DevOps roles
  • Experience with on-prem infrastructure operations, reliability, or capacity planning

Benefits & conditions

The salary range for this role is an estimate based on a wide range of compensation factors, inclusive of base salary only. Actual salary offer may vary based on (but not limited to) work experience, education and/or training, critical skills, and/or business considerations. Highly competitive equity grants are included in the majority of full time offers; and are considered part of Anduril's total compensation package. Additionally, Anduril offers top-tier benefits for full-time employees, including, At Anduril, we invest in our people. Our comprehensive, competitive benefits package (available at little to no cost to employees) ensures you're supported in health, recovery, and whatever comes next. For more information, Explore Our Benefits .

About the company

Anduril Industries is a defense technology company with a mission to transform U.S. and allied military capabilities with advanced technology. By bringing the expertise, technology, and business model of the 21st century's most innovative companies to the defense industry, Anduril is changing how military systems are designed, built and sold. Anduril's family of systems is powered by Lattice OS, an AI-powered operating system that turns thousands of data streams into a realtime, 3D command and control center. As the world enters an era of strategic competition, Anduril is committed to bringing cutting-edge autonomy, AI, computer vision, sensor fusion, and networking technology to the military in months, not years.

Apply for this position