Dev Ops Engineer
Role details
Job location
Tech stack
Job description
You'll own the developer platform for a small, multi-disciplinary engineering team building an autonomous surgical robot. The stack is primarily Python (ML, orchestration, much of the application code) with some C++ for performance-critical robot control and TypeScript for surgical UIs, running across on-prem lab compute, GPU workstations, and the cloud. You'll work directly with our software, ML, hardware, and data teams to make the development cycle fast and the deployments boring.
This is a team-of-one role. You'll set the platform direction, build it, and operate it., * Build and operate CI/CD covering our Python codebase (the bulk of the work), C++ robot control code, ML training pipelines, and TypeScript UIs - each with different testing and deploy patterns
- Own our lab compute infrastructure: the server-room PCs running Ubuntu, GPU workstations, and the supporting network
- Improve developer experience across the org: local dev environments, package management, build times, test reliability
- Integrate hardware-in-the-loop testing into the CI flow where it makes sense (the robot lives in the lab and needs to participate in regression testing)
- Standardize and harden security across on-prem and cloud
- Work with the ML team on GPU pipelines, experiment tracking, and model deployment
- Manage cloud infrastructure for training, data, and remote services
- Collaborate with the engineers to unlock the best tools and processes for the team
Requirements
Do you have experience in TypeScript?, * 5+ years of DevOps, platform, or infrastructure engineering on non-trivial systems
- Operated CI/CD for a polyglot codebase - you've debugged GitHub Actions runners, written nontrivial workflows, and understand the tradeoffs of self-hosted vs hosted runners
- Strong Linux administration skills and comfort with infrastructure-as-code
- Strong Python fluency - most of our code is Python and you'll be living in it daily; can read and contribute to C++ and TypeScript when needed
- Cloud experience (AWS or equivalent)
- Advanced fluency with coding agents (Claude Code, Cursor, or equivalents) - you use them as a daily force multiplier
- Clear communication and a service mindset - your job is to make other engineers faster, * Robotics, embedded, or scientific computing background - you've dealt with hardware that needs to be on for tests to pass
- ML pipeline tooling experience (SkyPilot, MetaFlow, Ray, or similar)
- Self-hosted GitHub Actions runners at scale
- Python monorepo tooling (uv, Poetry, Bazel) and C++ packaging (Conan, vcpkg)
- Real-time Linux experience
- Audit-friendly build infrastructure - signed builds, traceable artifacts, reproducible builds (relevant for IEC 62304 down the road)
- Prior medical device or regulated industry experience
- Docker and Kubernetes familiarity (we don't run K8s today but may grow into it)