About This Session
Service Reliability Engineering (SRE) has long been the discipline responsible for keeping complex systems healthy, resilient, and predictable under pressure. But the real power of SRE lies not just in the tools, dashboards, or operational frameworks—it lies in its philosophy: focusing on what matters most, measuring the right things, and making intentional trade-offs. As engineering leaders, we can apply these principles far beyond production environments. This talk explores how core SRE concepts can become high-leverage leadership tools for shaping team culture, guiding prioritization, and driving meaningful business outcomes. We begin with service criticality, expanding the traditional technical lens to view the entire end-to-end customer journey. Instead of assessing components in isolation, we’ll explore how to map dependencies across teams and systems to surface the true bottlenecks and organizational weak points that impact users. We’ll look at Service-Level Indicators (SLIs) and reinterpret them at the business level. What does “reliability” mean when framed through customer expectations rather than CPU metrics? We will see how engineering leaders define measurable signals that reflect whether the product is delivering on its intended value. Next, we’ll dig into Service-Level Objectives (SLOs)—not as uptime percentages, but as promises to customers. We'll discuss how leaders can craft SLOs that articulate what “good enough” looks like for the business, and how these objectives guide healthier conversations around trade-offs, investment, and risk. Finally, we’ll explore error budgets as a strategic leadership mechanism. Error budgets offer a structured way to balance innovation and stability, negotiate between delivery teams and product, and make aligned decisions about when to push forward and when to fix foundational issues.
Topics
- Metrics
- Site Reliability Engineering (SRE)
- Software Architecture