Senior Software Engineer II, DevEx, OPX
Role details
Job location
Tech stack
Job description
Senior Software Engineer II, Operational Excellence (OPX) within the Developer Experience organization. The OPX team ensures production health and system reliability across Samsara, focusing on automated safeguards, incident tooling, observability, and AI-driven operational solutions. This is a remote position for candidates in the Eastern Time Zone. Responsibilities
- Design and build automated reliability and self-healing systems that protect production at scale, including rollbacks, deploy safeguards, and fault mitigation, then offer them as platform tooling for use across the company.
- Own and improve incident management tooling and on-call health - reduce alert noise, surface actionable signals, and empower engineering teams to operate services with minimal burden.
- Develop and evolve observability infrastructure, including monitoring, alerting, SLOs, and performance regression detection, to give teams real-time, actionable visibility into system health and latency.
- Contribute to AI-driven operational tooling that moves beyond triage toward autonomous remediation where the system detects issues, takes corrective action, and self-recovers with minimal human involvement.
- Drive incident prevention by identifying systemic patterns and eliminating operational toil, with a deep empathy for on-call engineers.
- Partner directly with product engineering teams to diagnose reliability gaps, reduce operational burden, and help them adopt best practices for running services.
- Define and champion operational excellence best practices across engineering through guardrails, scorecards, and standards that help teams run services reliably by default.
- Champion, role model, and embed Samsara's cultural principles (Focus on Customer Success, Build for the Long Term, Adopt a Growth Mindset, Be Inclusive, Win as a Team) as the company scales globally.
Requirements
- 8+ years of experience designing and building products in a software engineering team.
- Bachelor's Degree in Computer Science/Engineering or equivalent experience.
- 3+ years of experience in infrastructure and/or platform engineering-focused teams.
- Expertise in observability and reliability, operational metrics, and data analysis.
- Track record in architecting monitoring frameworks, SLO platforms, and automated response workflows; experience with Datadog (or equivalent tools such as New Relic, Grafana).
- Experience with large-scale enterprise software applications.
- Experience in Developer Experience (DevEx) & Internal Portals: designing and implementing tools that centralize and simplify engineering operations.
- Familiarity with cloud platforms (AWS, GCP, or the like).
- Experience implementing AI-driven automation across the SDLC to reduce developer friction and accelerate delivery.
- Strong coding skills in Go, Python, or equivalent for infrastructure, deployment, and operations challenges.
- Mentoring and supporting engineers and role-modeling engineering practices within a technical lead capacity.
- Proactive growth mindset with a continual improvement focus.
An ideal candidate also has
- Strong communication skills and a desire to collaborate across teams.
- Experience with incident management tooling (Incident.io, PagerDuty, or equivalent).
- Experience with Infrastructure as Code (IaC) - Terraform.
Benefits & conditions
Annual base salary: $154,700 - $208,000 USD, with potential RSU grants and performance bonuses. Additional benefits include a flexible remote model, professional development stipend, comprehensive health care, and parental leave. Working arrangements
Remote position for candidates residing in the Eastern Time Zone of the US or Canada. No relocation assistance provided. EEO & Accommodations