Dev Ops Engineer

Remedy Robotics, Inc.
San Francisco, United States of America
yesterday

Role details

Contract type
Temporary to permanent
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

San Francisco, United States of America

Tech stack

Amazon Web Services (AWS)
C++
Ubuntu (Operating System)
Cloud Computing
Continuous Integration
Cursor (Graphical User Interface Elements)
Linux
DevOps
Github
Python
Linux System Administration
Machine Learning
Package Management Systems
Regression Testing
Scientific Computating
Software Engineering
TypeScript
Kubernetes
Machine Learning Operations
Docker

Job description

You'll own the developer platform for a small, multi-disciplinary engineering team building an autonomous surgical robot. The stack is primarily Python (ML, orchestration, much of the application code) with some C++ for performance-critical robot control and TypeScript for surgical UIs, running across on-prem lab compute, GPU workstations, and the cloud. You'll work directly with our software, ML, hardware, and data teams to make the development cycle fast and the deployments boring.

This is a team-of-one role. You'll set the platform direction, build it, and operate it., * Build and operate CI/CD covering our Python codebase (the bulk of the work), C++ robot control code, ML training pipelines, and TypeScript UIs - each with different testing and deploy patterns

  • Own our lab compute infrastructure: the server-room PCs running Ubuntu, GPU workstations, and the supporting network
  • Improve developer experience across the org: local dev environments, package management, build times, test reliability
  • Integrate hardware-in-the-loop testing into the CI flow where it makes sense (the robot lives in the lab and needs to participate in regression testing)
  • Standardize and harden security across on-prem and cloud
  • Work with the ML team on GPU pipelines, experiment tracking, and model deployment
  • Manage cloud infrastructure for training, data, and remote services
  • Collaborate with the engineers to unlock the best tools and processes for the team

Requirements

Do you have experience in TypeScript?, * 5+ years of DevOps, platform, or infrastructure engineering on non-trivial systems

  • Operated CI/CD for a polyglot codebase - you've debugged GitHub Actions runners, written nontrivial workflows, and understand the tradeoffs of self-hosted vs hosted runners
  • Strong Linux administration skills and comfort with infrastructure-as-code
  • Strong Python fluency - most of our code is Python and you'll be living in it daily; can read and contribute to C++ and TypeScript when needed
  • Cloud experience (AWS or equivalent)
  • Advanced fluency with coding agents (Claude Code, Cursor, or equivalents) - you use them as a daily force multiplier
  • Clear communication and a service mindset - your job is to make other engineers faster, * Robotics, embedded, or scientific computing background - you've dealt with hardware that needs to be on for tests to pass
  • ML pipeline tooling experience (SkyPilot, MetaFlow, Ray, or similar)
  • Self-hosted GitHub Actions runners at scale
  • Python monorepo tooling (uv, Poetry, Bazel) and C++ packaging (Conan, vcpkg)
  • Real-time Linux experience
  • Audit-friendly build infrastructure - signed builds, traceable artifacts, reproducible builds (relevant for IEC 62304 down the road)
  • Prior medical device or regulated industry experience
  • Docker and Kubernetes familiarity (we don't run K8s today but may grow into it)

About the company

Cardiovascular disease, is the #1 cause of morbidity and mortality in the world. Much of this could be prevented with better access to specialist care. Take stroke as an example: any delay in treatment can lead to permanent disability or death. However, due to a lack of specialist surgeons, the most effective intervention can only be performed in 2% of US hospitals. For patients who present to one of the 98% of hospitals that do not offer the surgery, treatment is either significantly delayed or not offered at all because timely transfer is not feasible. Our mission is to bring state-of-the-art vascular intervention to anyone, anytime, regardless of their location. Our team of medical clinicians, roboticists, and machine learning experts are working to bridge this gap by building the world's first remotely-operated, semi-autonomous endovascular surgical robot. We've already done what nobody else could-using our system, doctors from around the world were able to remotely perform this procedure from as far as 8000 miles away. We now need your help to bring this technology out of the laboratory and into hospitals everywhere.

Apply for this position