Staff ML Engineer - ML Infrastructure
Role details
Job location
Tech stack
Job description
- Partner with product and applied ML teams to ship ML-powered features (CV models, EcoDriving insights, LLM-based reporting) that improve safety, reliability, and cost efficiency.
- Lead throughput and cost modeling for new ML features-from exploration to production-scale capacity planning-to inform roadmap and go/no-go decisions.
- Drive experiment design and evaluation, defining success metrics, structuring A/B or offline tests, and turning results into product and technical decisions.
Inference & Edge Deployment
- Design and operate scalable online and batch inference systems (Ray, Spark), including deployment patterns, observability, SLOs, and unified training-to-production workflows.
- Partner with firmware and edge teams to package, validate, and deploy models to Samsara devices, and build feedback loops from edge to cloud for continuous improvement.
Reliability, Security & Operations
- Own reliability, observability, and security for ML systems across cloud and edge, including on-call practices, incident response, and infrastructure hardening.
- Own or co-own end-to-end technical delivery for high-priority or high-risk initiatives, from modeling and system design through production rollout.
Leadership & Culture
- Provide Staff+/Senior-Staff technical leadership on ML infrastructure architecture and strategy, influencing cross-team decisions and mentoring engineers and applied scientists.
- Drive strong developer experience through documentation, office hours, and best practices, while contributing to and representing Samsara in open source communities (Ray, Spark, RayDP).
- Champion and role model Samsara's cultural principles: Focus on Customer Success, Build for the Long Term, Adopt a Growth Mindset, Be Inclusive, Win as a Team., We use Tofu, a fraud detection tool, to validate the authenticity of applications and protect against identity fraud. This ensures we are connecting with real people and allows us to prioritize genuine candidates. Please see Samsara's Candidate Privacy Notice for more information.
Fraudulent Employment Offers
Samsara is aware of scams involving fake job interviews and offers. Please know we do not charge fees to applicants at any stage of the hiring process. Official communication about your application will only come from emails ending in @samsara.com, @us-greenhouse-mail.io or @mail3.guide.co. For more information regarding fraudulent employment offers, please visit ourblog post here. At Samsara, we welcome everyone regardless of their background. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, sex, gender, gender identity, sexual orientation, protected veteran status, disability, age, and other characteristics protected by law. We depend on the unique approaches of our team members to help us solve complex problems. We are committed to increasing diversity across our team and ensuring that Samsara is a place where people from all backgrounds can make an impact.
Requirements
- 10+ years of overall experience in machine learning engineering or related fields, with a strong track record of building and operating large-scale ML systems.
- Strong experience with distributed computing frameworks such as Ray and/or Spark.
- Hands-on experience with cloud infrastructure (AWS), containers/Kubernetes, and production observability tooling.
- Proven experience building or supporting ML platforms (training, experimentation, or inference) used by multiple teams.
- Solid understanding of ML fundamentals including evaluation, experiment design, and model iteration in production environments.
An ideal candidate also has:
- Experience shipping ML-powered features end-to-end, from design through production and iteration, with measurable impact on product or business metrics.
- Background in computer vision and/or LLM-based systems in production environments.
- Experience with edge or on-device ML and collaboration with firmware or embedded teams.
- Familiarity with model lifecycle systems (model registry, deployment, monitoring, rollback, drift detection).
- Experience working in environments with strong security and compliance requirements.
- Demonstrated ability to lead across teams and influence technical direction at Staff+ scope.
- A strong sense of ownership and a desire for end-to-end autonomy-from platform design to real-world impact.
Benefits & conditions
The range of annual base salary for full-time employees for this position is below. Please note that base pay offered may vary depending on factors including your city of residence, job-related knowledge, skills, and experience. This role is also eligible for an initial RSU grant with no vesting cliff, and ongoing refresh opportunities tied to performance, subject to plan terms and conditions. Learn more about our total rewards and benefits below. Annual Base Salary $200,200-$357,500 USD