Handling incidents collaboratively is like solving a rubix cube

What if developers could instrument their code and define SLOs with a simple decorator? Learn a new approach to making observability a shared responsibility.

#1about 4 minutes

The Rubik's Cube metaphor for engineering teams

Different engineering teams like backend and SREs operate on different sides of the system, requiring collaboration during incidents.

#2about 3 minutes

The first phase of resolving incidents collaboratively

The initial step in incident response is to establish a common understanding and transparency across teams before applying quick fixes.

#3about 2 minutes

Preventing future incidents with best practices

After resolving an incident, teams must collaborate on prevention by documenting best practices for patterns like service retries.

#4about 2 minutes

Discovering incidents through system observability

The discovery phase relies on making systems observable by collecting telemetry data like logs, metrics, and traces.

#5about 2 minutes

Standardizing telemetry collection with OpenTelemetry

OpenTelemetry provides a vendor-neutral standard for instrumenting applications, preventing vendor lock-in for observability backends.

#6about 2 minutes

Simplifying metrics with the Autometrics library

The open-source Autometrics library uses decorators to automatically generate key metrics like latency, errors, and request rate from functions.

#7about 5 minutes

Demo of generating metrics and SLOs from code

A live demo shows how Autometrics provides live metrics in the IDE and helps define SLOs that can be visualized in Grafana.

#8about 1 minute

Summary of collaborative incident management phases

A recap of the three key phases for collaborative incident handling: resolving, preventing, and discovering issues together.

#9about 2 minutes

Q&A on tooling and open source contribution

The speaker answers audience questions about managing tool complexity and the role of community contributions in open-source projects.

Nele Uhlemann

Handling incidents collaboratively is like solving a rubix cube

The Rubik's Cube metaphor for engineering teams

The first phase of resolving incidents collaboratively

Preventing future incidents with best practices

Discovering incidents through system observability

Standardizing telemetry collection with OpenTelemetry

Simplifying metrics with the Autometrics library

Demo of generating metrics and SLOs from code

Summary of collaborative incident management phases

Q&A on tooling and open source contribution

Matching moments

Featured Partners

Related Videos

Related Articles

From learning to earning