Sr. Site Reliability Engineer

Intuition Machines

Jackson Township, United States of America

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Remote

Jackson Township, United States of America

Tech stack

Microsoft Windows

Microsoft Active Directory

Amazon Web Services (AWS)

Bash

Cloud Computing

Dynamic Host Configuration Protocol

Linux

DNS

Hyper-V

Python

Network Security

Windows Server

Virtual Desktops

Network Architecture

Citrix Systems

Performance Tuning

Powershell

Red Hat Enterprise Linux - RHEL

Reliability Engineering

Site Reliability Engineering Practices

Virtualization Technology

Scripting (Bash/Python/Go/Ruby)

High Performance Computing

Reliability of Systems

HybridCloud

Firewalls (Computer Science)

Patch Management

Data Management

Hardware Infrastructure

Cisco networks

VMware

Job description

About the Position: We're looking for a Senior Site Reliability Engineer with deep enterprise infrastructure experience to help ensure the reliability, availability, and performance of systems supporting spacecraft design, manufacturing, and mission operations. In this role, you will bridge traditional infrastructure operations with modern SRE practices, focusing on proactive reliability, scalability, and performance.

This is a remote position with quarterly travel to Bay Area facilities and occasional onsite support for critical incidents., * Define and maintain Service Level Objectives (SLOs) and error budgets for infrastructure services

Lead incident response efforts, perform root cause analysis, and implement preventive solutions
Design, implement, and maintain hybrid and on-prem infrastructure with a focus on reliability and performance
Ensure availability and performance of virtualization platforms (VMware, Hyper-V, Citrix environments)
Manage enterprise patching across Windows and Linux systems
Maintain and optimize storage platforms across SAN/NAS environments
Design and validate network architecture, including firewalls and switching infrastructure
Establish and maintain infrastructure baselines aligned with security and compliance frameworks (CIS, DISA STIG)
Support compute infrastructure including enterprise server platforms
Collaborate on hybrid cloud initiatives and AWS-based infrastructure
Administer core services including Active Directory, DNS, DHCP, and virtual desktop environments
Develop automation scripts to reduce operational overhead and improve efficiency
Build and maintain monitoring, alerting, and documentation for infrastructure systems
Participate in an on-call rotation supporting 24/7 mission-critical operations
Mentor and guide engineers on reliability best practices and troubleshooting
Collaborate cross-functionally with engineering, manufacturing, and security teams
Travel quarterly for onsite planning, coordination, and hands-on support as needed

Requirements

15+ years of experience in enterprise infrastructure, systems administration, or Site Reliability Engineering roles
Strong experience with virtualization platforms (VMware, Hyper-V, Citrix, VDI environments)
Deep understanding of networking and security architecture (firewalls, switching, secure design)
Experience managing enterprise storage systems and SAN/NAS environments
Proficiency in Linux (RHEL) and Windows Server administration
Experience with Active Directory, DNS, DHCP, and patch management systems
Familiarity with AWS and hybrid cloud infrastructure environments
Scripting experience (PowerShell, Bash, Python) for automation and operations
Strong troubleshooting skills and ability to resolve complex infrastructure issues
Experience with monitoring, performance tuning, and system reliability practices
Ability to work cross-functionally and communicate with technical and non-technical stakeholders
Willingness to travel quarterly and provide onsite support during critical events
Must be a U.S. Citizen or Permanent Resident due to ITAR requirements

Preferred Requirements:

Experience in aerospace, defense, or regulated manufacturing environments
Relevant certifications (VMware, Cisco, Citrix, AWS)
Familiarity with cloud reliability engineering practices
Experience supporting manufacturing execution systems (MES) or mission operations environments
Background in high-performance computing (HPC) infrastructure
Strong experience working in remote environments with distributed teams

About the company

We provide machine learning products and services at scale to some of the largest companies in the world. Focus on meta-learning and visual domain ML.
Intuition Machines has decades of software and ML expertise. We build and operate massively scalable systems to tackle some of today's hardest problems.

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all