Dev Ops Engineer

Remedy Robotics, Inc.

San Francisco, United States of America

yesterday

Role details

Contract type

Temporary to permanent

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

San Francisco, United States of America

Tech stack

Amazon Web Services (AWS)

C++

Ubuntu (Operating System)

Cloud Computing

Continuous Integration

Cursor (Graphical User Interface Elements)

Linux

DevOps

Github

Python

Linux System Administration

Machine Learning

Package Management Systems

Regression Testing

Scientific Computating

Software Engineering

TypeScript

Kubernetes

Machine Learning Operations

Docker

Job description

You'll own the developer platform for a small, multi-disciplinary engineering team building an autonomous surgical robot. The stack is primarily Python (ML, orchestration, much of the application code) with some C++ for performance-critical robot control and TypeScript for surgical UIs, running across on-prem lab compute, GPU workstations, and the cloud. You'll work directly with our software, ML, hardware, and data teams to make the development cycle fast and the deployments boring.

This is a team-of-one role. You'll set the platform direction, build it, and operate it., * Build and operate CI/CD covering our Python codebase (the bulk of the work), C++ robot control code, ML training pipelines, and TypeScript UIs - each with different testing and deploy patterns

Own our lab compute infrastructure: the server-room PCs running Ubuntu, GPU workstations, and the supporting network
Improve developer experience across the org: local dev environments, package management, build times, test reliability
Integrate hardware-in-the-loop testing into the CI flow where it makes sense (the robot lives in the lab and needs to participate in regression testing)
Standardize and harden security across on-prem and cloud
Work with the ML team on GPU pipelines, experiment tracking, and model deployment
Manage cloud infrastructure for training, data, and remote services
Collaborate with the engineers to unlock the best tools and processes for the team

Requirements

Do you have experience in TypeScript?, * 5+ years of DevOps, platform, or infrastructure engineering on non-trivial systems

Operated CI/CD for a polyglot codebase - you've debugged GitHub Actions runners, written nontrivial workflows, and understand the tradeoffs of self-hosted vs hosted runners
Strong Linux administration skills and comfort with infrastructure-as-code
Strong Python fluency - most of our code is Python and you'll be living in it daily; can read and contribute to C++ and TypeScript when needed
Cloud experience (AWS or equivalent)
Advanced fluency with coding agents (Claude Code, Cursor, or equivalents) - you use them as a daily force multiplier
Clear communication and a service mindset - your job is to make other engineers faster, * Robotics, embedded, or scientific computing background - you've dealt with hardware that needs to be on for tests to pass
ML pipeline tooling experience (SkyPilot, MetaFlow, Ray, or similar)
Self-hosted GitHub Actions runners at scale
Python monorepo tooling (uv, Poetry, Bazel) and C++ packaging (Conan, vcpkg)
Real-time Linux experience
Audit-friendly build infrastructure - signed builds, traceable artifacts, reproducible builds (relevant for IEC 62304 down the road)
Prior medical device or regulated industry experience
Docker and Kubernetes familiarity (we don't run K8s today but may grow into it)

About the company

Cardiovascular disease, is the #1 cause of morbidity and mortality in the world. Much of this could be prevented with better access to specialist care. Take stroke as an example: any delay in treatment can lead to permanent disability or death. However, due to a lack of specialist surgeons, the most effective intervention can only be performed in 2% of US hospitals. For patients who present to one of the 98% of hospitals that do not offer the surgery, treatment is either significantly delayed or not offered at all because timely transfer is not feasible. Our mission is to bring state-of-the-art vascular intervention to anyone, anytime, regardless of their location. Our team of medical clinicians, roboticists, and machine learning experts are working to bridge this gap by building the world's first remotely-operated, semi-autonomous endovascular surgical robot. We've already done what nobody else could-using our system, doctors from around the world were able to remotely perform this procedure from as far as 8000 miles away. We now need your help to bring this technology out of the laboratory and into hospitals everywhere.

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all