Nele Uhlemann

Handling incidents collaboratively is like solving a rubix cube

What if developers could instrument their code and define SLOs with a simple decorator? Learn a new approach to making observability a shared responsibility.

Handling incidents collaboratively is like solving a rubix cube
#1about 4 minutes

The Rubik's Cube metaphor for engineering teams

Different engineering teams like backend and SREs operate on different sides of the system, requiring collaboration during incidents.

#2about 3 minutes

The first phase of resolving incidents collaboratively

The initial step in incident response is to establish a common understanding and transparency across teams before applying quick fixes.

#3about 2 minutes

Preventing future incidents with best practices

After resolving an incident, teams must collaborate on prevention by documenting best practices for patterns like service retries.

#4about 2 minutes

Discovering incidents through system observability

The discovery phase relies on making systems observable by collecting telemetry data like logs, metrics, and traces.

#5about 2 minutes

Standardizing telemetry collection with OpenTelemetry

OpenTelemetry provides a vendor-neutral standard for instrumenting applications, preventing vendor lock-in for observability backends.

#6about 2 minutes

Simplifying metrics with the Autometrics library

The open-source Autometrics library uses decorators to automatically generate key metrics like latency, errors, and request rate from functions.

#7about 5 minutes

Demo of generating metrics and SLOs from code

A live demo shows how Autometrics provides live metrics in the IDE and helps define SLOs that can be visualized in Grafana.

#8about 1 minute

Summary of collaborative incident management phases

A recap of the three key phases for collaborative incident handling: resolving, preventing, and discovering issues together.

#9about 2 minutes

Q&A on tooling and open source contribution

The speaker answers audience questions about managing tool complexity and the role of community contributions in open-source projects.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
DC
Daniel Cranney
Dev Digest 188: CfP time, the risks of NPM and IKEA algorithms
Inside last week’s Dev Digest 188 . 🤖 GitHub Copilot CLI is now in public review 💻 Microsoft is bringing ‘vibe working’ to office apps 🎣 Attackers abuse AI tools to generate captchas in fishing attacks ⚠️ When LLMs autonomously attack 🧠 Common cause...
Dev Digest 188: CfP time, the risks of NPM and IKEA algorithms
DC
Daniel Cranney
Dev Digest 196: AI Killed DevOps, LLM Political Bias & AI Security
Inside last week’s Dev Digest 196 . ⚖️ Political bias in LLMs 🫣 AI written code causes 1 in 5 security breaches 🖼️ Is there a limit to alternative text on images? 📝 CodeWiki - understand code better 🟨 Long tasks in JavaScript 👻 Scare yourself into n...
Dev Digest 196: AI Killed DevOps, LLM Political Bias & AI Security

From learning to earning

Jobs that call for the skills explored in this talk.

Rust and GoLang

Rust and GoLang

NHe4a GmbH
Karlsruhe, Germany

Remote
55-65K
Intermediate
Senior
Go
Rust
Backend Engineer (m/w/d)

Backend Engineer (m/w/d)

fulfillmenttools
Köln, Germany

35-65K
Intermediate
TypeScript
Agile Methodologies
Google Cloud Platform
Site Reliability Engineer

Site Reliability Engineer

Synsel Techniek
Rotterdam, Netherlands

9K
Intermediate
Azure
Ansible
Terraform
Amazon Web Services (AWS)