Martin Beránek
SRE Methods In an Agency Environment
#1about 1 minute
Defining the core concepts of SLI, SLO, and SLA
Understand the foundational SRE terms: Service Level Indicators (SLI) as measurements, Service Level Objectives (SLO) as targets, and Service Level Agreements (SLA) as contracts with penalties.
#2about 6 minutes
Mapping SRE responsibilities in an agency-customer model
Clarify the roles and responsibilities between the agency, the customer, and the end-user clients using a simple relationship model.
#3about 4 minutes
Collaboratively creating SLO documentation with customers
Navigate the process of defining realistic SLOs with customers, from initial guessing games and benchmarking to periodic evaluation after launch.
#4about 2 minutes
Navigating the two primary application handover scenarios
Prepare for two distinct handover situations: when the customer has the capacity to take over operations versus when the agency retains responsibility for reliability.
#5about 3 minutes
The three essential SRE documents for agencies
Implement a "holy trinity" of documentation—the SLO document, support playbooks, and postmortems—to ensure clarity and operational readiness.
#6about 4 minutes
How to write effective and blameless postmortems
Structure postmortems to be detailed and blameless by including key sections like impact, root cause, resolution, action items, and a minute-by-minute timeline.
#7about 4 minutes
Defining key roles for effective incident management
Establish clear responsibilities during an incident by assigning an Incident Commander, Communications Lead, and Operations Lead to streamline resolution.
#8about 5 minutes
Managing unexpected costs from environment and security issues
Account for unexpected work from cloud provider changes and security vulnerabilities by using an error budget policy to assess impact and prioritize fixes.
#9about 4 minutes
Securely handing over credentials and application secrets
Execute a secure handover by properly managing user credentials in cloud environments like GCP and AWS and using secret managers for application secrets.
#10about 3 minutes
Finalizing the handover with documentation and tooling
Complete the project handover by sharing the essential SRE documents, explaining relevant tooling, and conducting an adoption period with the customer's team.
#11about 9 minutes
Key takeaways for applying SRE in an agency
Recognize that SRE is often underestimated, requires extensive explanation, and should ultimately focus on improving the user experience rather than just methodology.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
06:30 MIN
Applying agile and SRE principles to incident response
Applying Agile Principles to Incident Management
27:09 MIN
Actionable takeaways for SREs on incident management
Serverless Observability: where SLOs meet transforms
12:35 MIN
Understanding the "shift left" movement and developer responsibility
What Developers Get Wrong About Application Quality
24:30 MIN
Fostering cross-team collaboration with SLOs
Serverless Observability: where SLOs meet transforms
35:39 MIN
Q&A: How to scale quality practices in large teams
What Developers Get Wrong About Application Quality
15:11 MIN
Adopt a reliability mindset and plan for mistakes
Staying Safe in the AI Future
31:50 MIN
Q&A on setting realistic SLOs and choosing tools
Serverless Observability: where SLOs meet transforms
51:53 MIN
Q&A on shared systems and scaling productivity
Forget Developer Platforms, Think Developer Productivity!
Featured Partners
Related Videos
Applying Agile Principles to Incident Management
Tobias Dunn-Krahn
GitLab CI pipelines for a whole company
Martin Beránek
How to Build Truly Production-ready Apps - Modern JS-based SaaS Stack for Indie Devs and Small Teams
Dávid Lévai
Platform Engineering vs. DevOps Why not both?
Christian Strack
Technology is Necessary, But Not Sufficient
Simon Copsey
Forget Developer Platforms, Think Developer Productivity!
Robert Hoffmann & Christian Denich
Handling incidents collaboratively is like solving a rubix cube
Nele Uhlemann
Strategies to accelerate SaaS Application Development
Rajalakshmi Srinivasan
From learning to earning
Jobs that call for the skills explored in this talk.




Cloud Site Reliability Engineer onsite or remote in Germany
Scalable GmbH
Remote
DevOps
Python
Terraform
AWS Lambda
+1

Cloud Site Reliability Engineer onsite or remote in Germany
Scalable GmbH
Remote
DevOps
Python
Terraform
AWS Lambda
+1


(Senior) Cloud Site Reliability Engineer onsite or remote in Germany
Scalable GmbH
Remote
Senior
DevOps
Python
Terraform
AWS Lambda
+1

