Staff Platform Monitoring Engineer
Role details
Job location
Tech stack
Job description
We are seeking an exceptional Senior Platform Monitoring Engineer to join our Platform Monitoring Team. This is a high-impact technical role for someone who thrives at the intersection of platform reliability, incident response, and customer obsession. You will serve as a critical first responder for the Databricks Platform, leading complex investigations, designing observability solutions, and driving systemic improvements that enhance customer experience and platform stability., * Lead platform incident investigation, coordinating cross-functional teams through rapid detection, mitigation, and resolution to minimize customer impact.
- Conduct thorough post-incident root cause analysis across infrastructure, services, and cloud providers to identify systemic patterns and prevent future occurrences.
- Design and implement customer-focused alerting pipelines and end-to-end observability workflows to enhance detection coverage and reduce mean time to detection.
- Build automation tools, establish reusable monitoring patterns, and resolve reliability gaps that directly impact customer experience.
Requirements
Do you have experience in Spark?, Do you have a Master's degree?, * Minimum of 5 years of experience as an SRE, DevOps Engineer, Production Engineer, or similar role.
-
Production-level experience with at least one major cloud provider (AWS, Azure, GCP) and proficiency in container and orchestration technologies (Docker, Kubernetes).
-
Hands-on experience with monitoring, logging, and alerting tools such as ELK, Prometheus, Grafana, PagerDuty, etc. Ability to architect monitoring solutions that correlate metrics, logs, and traces.
-
Strong proficiency in Python or similar languages with the ability to build production-quality automation tools.
-
Experience owning critical phases of the incident lifecycle from detection through resolution and post-mortem analysis in demanding production environments.
-
BS or Master's, or PhD in Computer Science or Computer Engineering, or related Engineering field.
Benefits & conditions
At Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees.