Handling incidents collaboratively is like solving a rubix cube
What if developers could instrument their code and define SLOs with a simple decorator? Learn a new approach to making observability a shared responsibility.
#1about 4 minutes
The Rubik's Cube metaphor for engineering teams
Different engineering teams like backend and SREs operate on different sides of the system, requiring collaboration during incidents.
#2about 3 minutes
The first phase of resolving incidents collaboratively
The initial step in incident response is to establish a common understanding and transparency across teams before applying quick fixes.
#3about 2 minutes
Preventing future incidents with best practices
After resolving an incident, teams must collaborate on prevention by documenting best practices for patterns like service retries.
#4about 2 minutes
Discovering incidents through system observability
The discovery phase relies on making systems observable by collecting telemetry data like logs, metrics, and traces.
#5about 2 minutes
Standardizing telemetry collection with OpenTelemetry
OpenTelemetry provides a vendor-neutral standard for instrumenting applications, preventing vendor lock-in for observability backends.
#6about 2 minutes
Simplifying metrics with the Autometrics library
The open-source Autometrics library uses decorators to automatically generate key metrics like latency, errors, and request rate from functions.
#7about 5 minutes
Demo of generating metrics and SLOs from code
A live demo shows how Autometrics provides live metrics in the IDE and helps define SLOs that can be visualized in Grafana.
#8about 1 minute
Summary of collaborative incident management phases
A recap of the three key phases for collaborative incident handling: resolving, preventing, and discovering issues together.
#9about 2 minutes
Q&A on tooling and open source contribution
The speaker answers audience questions about managing tool complexity and the role of community contributions in open-source projects.
Related jobs
Jobs that call for the skills explored in this talk.
Why Attend a Developer Event?Modern software engineering moves too fast for documentation alone. Attending a world-class event is about shifting from tactical execution to strategic leadership.
Skill Diversification: Break out of your specific tech stack to see how the industry...
Daniel Cranney
Dev Digest 214: Claude Is Leaking, GitHub Is Listening & Axios Hacked!Inside last week’s Dev Digest 214 .
🕵️ Claude source code leaked, analysed and re-written in 2 days
🐙 GitHub auto-opts users into feeding their code to train their AI
🌐 Pretext shows how to show complex text rendering in the browser
🤖 How to securin...
Daniel Cranney
Panel Discussion: Responsible AI in Practice - Real-World Examples and ChallengesIntroductionIn the ever-evolving landscape of artificial intelligence, the concept of "responsible AI" has emerged as a cornerstone for ethical and practical AI implementation. During the WWC24 Panel discussion, three eminent experts—Mina, Bjorn Brin...
From learning to earning
Jobs that call for the skills explored in this talk.