Nele Uhlemann
Handling incidents collaboratively is like solving a rubix cube
#1about 4 minutes
The Rubik's Cube metaphor for engineering teams
Different engineering teams like backend and SREs operate on different sides of the system, requiring collaboration during incidents.
#2about 3 minutes
The first phase of resolving incidents collaboratively
The initial step in incident response is to establish a common understanding and transparency across teams before applying quick fixes.
#3about 2 minutes
Preventing future incidents with best practices
After resolving an incident, teams must collaborate on prevention by documenting best practices for patterns like service retries.
#4about 2 minutes
Discovering incidents through system observability
The discovery phase relies on making systems observable by collecting telemetry data like logs, metrics, and traces.
#5about 2 minutes
Standardizing telemetry collection with OpenTelemetry
OpenTelemetry provides a vendor-neutral standard for instrumenting applications, preventing vendor lock-in for observability backends.
#6about 2 minutes
Simplifying metrics with the Autometrics library
The open-source Autometrics library uses decorators to automatically generate key metrics like latency, errors, and request rate from functions.
#7about 5 minutes
Demo of generating metrics and SLOs from code
A live demo shows how Autometrics provides live metrics in the IDE and helps define SLOs that can be visualized in Grafana.
#8about 1 minute
Summary of collaborative incident management phases
A recap of the three key phases for collaborative incident handling: resolving, preventing, and discovering issues together.
#9about 2 minutes
Q&A on tooling and open source contribution
The speaker answers audience questions about managing tool complexity and the role of community contributions in open-source projects.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
06:30 MIN
Applying agile and SRE principles to incident response
Applying Agile Principles to Incident Management
27:09 MIN
Actionable takeaways for SREs on incident management
Serverless Observability: where SLOs meet transforms
20:29 MIN
Using an incident console to manage response and resolvers
Applying Agile Principles to Incident Management
24:30 MIN
Fostering cross-team collaboration with SLOs
Serverless Observability: where SLOs meet transforms
29:58 MIN
How engineers handle production errors and monitoring
DevOps at Netflix
18:09 MIN
Overcoming observability challenges with a unified platform
All your telemetry data from any source in one place
28:08 MIN
Improving incident response to make on-call less painful
What Developers Get Wrong About Application Quality
13:39 MIN
Building resilience across all software stack layers
System Resilience: Surviving the Software Storm
Featured Partners
Related Videos
Empathy: The secret sauce of Resilience
Malin Litwinski
Unveiling the Dark Side: Navigating the Pitfalls of Digital Ambitions
Johannes Hansen
One size fits all! Not at all!
Ixchel Ruiz
The Software Bug All Stars - and what we can learn from them
Christian Seifert
I broke the production
Arto Liukkonen
Applying Agile Principles to Incident Management
Tobias Dunn-Krahn
Mastering AI-Driven Problem Solving in Engineering with Observability
Jemiah Sius
The AI-Ready Stack: Rethinking the Engineering Org of the Future
Jan Oberhauser, Mirko Novakovic, Alex Laubscher & Keno Dreßel
From learning to earning
Jobs that call for the skills explored in this talk.
Site Reliability Engineer SRE Golang Rust Linux
Digistrat consulting
Paris, France
Remote
Go
GIT
Ruby
Linux
+3
[CH] Site Reliability Engineer (Monitoring & Incident Response Focus)
Welld Sagl
Lugano, Switzerland
Remote
€187-208K
Java
Bash
Linux
+9
Site Reliability Engineer
N Consulting Ltd
Charing Cross, United Kingdom
€70-75K
Bash
Linux
Python
Splunk
+11
Site Reliability Engineer (SRE) - Application Support
ZILO
Charing Cross, United Kingdom
Remote
€59K
Go
Java
Azure
+6
Site Reliability Engineer / Live Operations Engineer
Lorien
Lanark, United Kingdom
€148K
Java
MySQL
Microservices
Amazon Web Services (AWS)


