Sr. Site Reliability Engineer

Intuition Machines
Jackson Township, United States of America
6 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote
Jackson Township, United States of America

Tech stack

Microsoft Windows
Microsoft Active Directory
Amazon Web Services (AWS)
Bash
Cloud Computing
Dynamic Host Configuration Protocol
Linux
DNS
Hyper-V
Python
Network Security
Windows Server
Virtual Desktops
Network Architecture
Citrix Systems
Performance Tuning
Powershell
Red Hat Enterprise Linux - RHEL
Reliability Engineering
Site Reliability Engineering Practices
Virtualization Technology
Scripting (Bash/Python/Go/Ruby)
High Performance Computing
Reliability of Systems
HybridCloud
Firewalls (Computer Science)
Patch Management
Data Management
Hardware Infrastructure
Cisco networks
VMware

Job description

About the Position: We're looking for a Senior Site Reliability Engineer with deep enterprise infrastructure experience to help ensure the reliability, availability, and performance of systems supporting spacecraft design, manufacturing, and mission operations. In this role, you will bridge traditional infrastructure operations with modern SRE practices, focusing on proactive reliability, scalability, and performance.

This is a remote position with quarterly travel to Bay Area facilities and occasional onsite support for critical incidents., * Define and maintain Service Level Objectives (SLOs) and error budgets for infrastructure services

  • Lead incident response efforts, perform root cause analysis, and implement preventive solutions
  • Design, implement, and maintain hybrid and on-prem infrastructure with a focus on reliability and performance
  • Ensure availability and performance of virtualization platforms (VMware, Hyper-V, Citrix environments)
  • Manage enterprise patching across Windows and Linux systems
  • Maintain and optimize storage platforms across SAN/NAS environments
  • Design and validate network architecture, including firewalls and switching infrastructure
  • Establish and maintain infrastructure baselines aligned with security and compliance frameworks (CIS, DISA STIG)
  • Support compute infrastructure including enterprise server platforms
  • Collaborate on hybrid cloud initiatives and AWS-based infrastructure
  • Administer core services including Active Directory, DNS, DHCP, and virtual desktop environments
  • Develop automation scripts to reduce operational overhead and improve efficiency
  • Build and maintain monitoring, alerting, and documentation for infrastructure systems
  • Participate in an on-call rotation supporting 24/7 mission-critical operations
  • Mentor and guide engineers on reliability best practices and troubleshooting
  • Collaborate cross-functionally with engineering, manufacturing, and security teams
  • Travel quarterly for onsite planning, coordination, and hands-on support as needed

Requirements

  • 15+ years of experience in enterprise infrastructure, systems administration, or Site Reliability Engineering roles
  • Strong experience with virtualization platforms (VMware, Hyper-V, Citrix, VDI environments)
  • Deep understanding of networking and security architecture (firewalls, switching, secure design)
  • Experience managing enterprise storage systems and SAN/NAS environments
  • Proficiency in Linux (RHEL) and Windows Server administration
  • Experience with Active Directory, DNS, DHCP, and patch management systems
  • Familiarity with AWS and hybrid cloud infrastructure environments
  • Scripting experience (PowerShell, Bash, Python) for automation and operations
  • Strong troubleshooting skills and ability to resolve complex infrastructure issues
  • Experience with monitoring, performance tuning, and system reliability practices
  • Ability to work cross-functionally and communicate with technical and non-technical stakeholders
  • Willingness to travel quarterly and provide onsite support during critical events
  • Must be a U.S. Citizen or Permanent Resident due to ITAR requirements

Preferred Requirements:

  • Experience in aerospace, defense, or regulated manufacturing environments
  • Relevant certifications (VMware, Cisco, Citrix, AWS)
  • Familiarity with cloud reliability engineering practices
  • Experience supporting manufacturing execution systems (MES) or mission operations environments
  • Background in high-performance computing (HPC) infrastructure
  • Strong experience working in remote environments with distributed teams

About the company

We provide machine learning products and services at scale to some of the largest companies in the world. Focus on meta-learning and visual domain ML.
Intuition Machines has decades of software and ML expertise. We build and operate massively scalable systems to tackle some of today's hardest problems.

Apply for this position