Nele Uhlemann
Handling incidents collaboratively is like solving a rubix cube
#1about 4 minutes
The Rubik's Cube metaphor for engineering teams
Different engineering teams like backend and SREs operate on different sides of the system, requiring collaboration during incidents.
#2about 3 minutes
The first phase of resolving incidents collaboratively
The initial step in incident response is to establish a common understanding and transparency across teams before applying quick fixes.
#3about 2 minutes
Preventing future incidents with best practices
After resolving an incident, teams must collaborate on prevention by documenting best practices for patterns like service retries.
#4about 2 minutes
Discovering incidents through system observability
The discovery phase relies on making systems observable by collecting telemetry data like logs, metrics, and traces.
#5about 2 minutes
Standardizing telemetry collection with OpenTelemetry
OpenTelemetry provides a vendor-neutral standard for instrumenting applications, preventing vendor lock-in for observability backends.
#6about 2 minutes
Simplifying metrics with the Autometrics library
The open-source Autometrics library uses decorators to automatically generate key metrics like latency, errors, and request rate from functions.
#7about 5 minutes
Demo of generating metrics and SLOs from code
A live demo shows how Autometrics provides live metrics in the IDE and helps define SLOs that can be visualized in Grafana.
#8about 1 minute
Summary of collaborative incident management phases
A recap of the three key phases for collaborative incident handling: resolving, preventing, and discovering issues together.
#9about 2 minutes
Q&A on tooling and open source contribution
The speaker answers audience questions about managing tool complexity and the role of community contributions in open-source projects.
Related jobs
Jobs that call for the skills explored in this talk.
Featured Partners
Related Videos
Empathy: The secret sauce of Resilience
Malin Litwinski
Unveiling the Dark Side: Navigating the Pitfalls of Digital Ambitions
Johannes Hansen
One size fits all! Not at all!
Ixchel Ruiz
The Software Bug All Stars - and what we can learn from them
Christian Seifert
I broke the production
Arto Liukkonen
Applying Agile Principles to Incident Management
Tobias Dunn-Krahn
Mastering AI-Driven Problem Solving in Engineering with Observability
Jemiah Sius
The AI-Ready Stack: Rethinking the Engineering Org of the Future
Jan Oberhauser, Mirko Novakovic, Alex Laubscher & Keno Dreßel
From learning to earning
Jobs that call for the skills explored in this talk.


DevOps Engineer – Kubernetes & Cloud (m/w/d)
epostbox epb GmbH
Berlin, Germany
Intermediate
Senior
DevOps
Kubernetes
Cloud (AWS/Google/Azure)


Site Reliability Engineer SRE Golang Rust Linux
Digistrat consulting
Paris, France
Remote
Go
GIT
Ruby
Linux
+3
Site Reliability Engineer (SRE)
Traveltechessentialist
Barcelona, Spain
Bash
Python
Docker
Node.js
Grafana
+4


