Site Reliability Engineer, Robotics

Hadrian Automation
Los Angeles, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Compensation
$ 270K

Job location

Los Angeles, United States of America

Tech stack

C++
Linux
Middleware
EtherCAT
Python
Message Queuing Telemetry Transport (MQTT)
RabbitMQ
Reliability Engineering
Prometheus
OPC Unified Architecture
TypeScript
Datadog
Diagnostic Tools
Computer Network Operations
Infrastructure as Code (IaC)
Kubernetes
Bare Metal
Kafka
Hardware Infrastructure
Go

Job description

What You'll Do

  • Own the reliability of our robotics systems, from PLCs through ROS2/middleware to Kubernetes.
  • Build interfaces to our observability system to ingest telemetry from our controls and robotics systems. Leverage solutions such as Prometheus, Telegraf, OpenTelemetry, and Datadog.
  • Write code frameworks and tools to support our controls and robotics systems, including diagnostic tools, shared libraries for telemetry data, and automated remediation.
  • Partner with controls, robotics, and platform engineering teams to bake reliability in early. Review designs, develop SLOs and SLIs, introduce reliability release gates, and push for telemetry contracts to develop production-grade services.

Requirements

Do you have experience in TypeScript?, * Ownership. Someone who has owned the reliability of a production system where downtime had physical or operational consequences (manufacturing line, autonomous vehicle, lab automation, network operations)

  • Systems Thinker. Focused on understanding the relationship among various systems to design sustainable solutions, not one-time fixes.
  • Problem Solver. Solving complex puzzles excites and motivates you to find an efficient solution.
  • T-Shaped Skill Set. Comfortable with bare metal Kubernetes, networking, GitOps workflows, and Infrastructure as Code (IaC). Also skilled in programming in TypeScript, Python, Golang, or C++.
  • Strong Communication. You can run a war room, write a post-mortem, and explain a reliability tradeoff to a stakeholder., * Background in edge/on-prem infrastructure. You've run Kubernetes at the edge (k3s, k0s, k0smotron), managing on-prem clusters, time-series at the edge, or air-gapped deployments. A deep understanding of Linux operating system fundamentals such as cgroups, sockets, and system tuning, is a big plus.
  • Deep understanding of shipping and storing telemetry data at scale. Experience with Kafka/MQTT/RabbitMQ is a plus.
  • Direct robotics experience. ROS/ROS2, OPC UA, EtherCAT, motion controllers, or fleet management for autonomous systems
  • An individual who is self-directed and can deliver with high velocity.

Benefits & conditions

Pulled from the full job description

  • 401(k)
  • Health insurance
  • Vision insurance
  • Dental insurance
  • Life insurance, For this role, the target salary range is $164,000 - $270,000 (actual range may vary based on experience).

About the company

Hadrian is building autonomous factories that help aerospace and defense companies manufacture rockets, satellites, jets, and ships up to 10x faster and up to 2x cheaper. By combining advanced software, robotics, and full-stack manufacturing, we are reinventing how America produces its most critical parts. We're accelerating our mission with the launch of Factory 3 in Mesa, Arizona, a 290,000-square-foot facility creating 350 new jobs. We are expanding rapidly to support thousands of future hires, launching Hadrian Maritime to expand into naval production, and introducing a Factory-as-a-Service model that delivers complete systems instead of individual parts. Hadrian is backed by leading investors including T. Rowe Price, Lux Capital, Founders Fund, and Andreessen Horowitz, our fast-growing team is united around reindustrializing American manufacturing for the 21st century and beyond.

Apply for this position