What we Learned from Reading 100+ Kubernetes Post-Mortems
What's the #1 cause of Kubernetes outages? After analyzing over 100 post-mortems, the answer is surprisingly simple—and completely preventable.
#1about 6 minutes
Understanding the developer versus DevOps cultural divide
A story from a DevOps meetup illustrates the different goals and perspectives that create friction between developers and operations teams.
#2about 2 minutes
Bridge the gap with champions and failure stories
Delegate knowledge to developer champions and learn best practices by studying the post-mortem stories of other companies.
#3about 5 minutes
Common Kubernetes misconfigurations from real outages
Examples from Target and Zalando show how simple errors like incorrect CronJob concurrency policies or missing memory limits can cause major production failures.
#4about 3 minutes
How to introduce policy enforcement gradually
Avoid organizational friction by implementing new policies slowly, starting with a single pilot team to gain agreement and understanding before a wider rollout.
#5about 3 minutes
Categorizing the three types of Kubernetes failures
Kubernetes failures typically fall into three categories: simple syntax errors, gaps in knowledge of best practices, and misalignment with internal company policies.
#6about 2 minutes
Validating Kubernetes YAML for syntax and schema errors
Use tools like yq for YAML format validation and kubeconform for schema validation without requiring direct cluster access for developers.
#7about 4 minutes
The challenges of managing policies as code in Git
Managing policies in Git creates versioning nightmares and lacks features for permissions, dynamic adjustments, and providing clear remediation guidelines.
#8about 4 minutes
Using Datree for centralized policy management
Datree is an open-source tool that provides a centralized location for managing policies, which are then enforced locally and in CI for developers.
#9about 1 minute
The real meaning of shifting responsibility left
True shift-left culture is not just about tools but about delegating responsibility and empowering developers to own their configurations.
Related jobs
Jobs that call for the skills explored in this talk.
Learning Kubernetes made easy with KubeCampusLearning to use Kubernetes? KubeCampus by Kasten offers free educational content for all skill levels to get you started!Kubernetes is an open-source system for deploying, scaling and managing containerized applications. It allows you to deploy your ...
Daniel Cranney
Dev Digest 188: CfP time, the risks of NPM and IKEA algorithmsInside last week’s Dev Digest 188 .
🤖 GitHub Copilot CLI is now in public review
💻 Microsoft is bringing ‘vibe working’ to office apps
🎣 Attackers abuse AI tools to generate captchas in fishing attacks
⚠️ When LLMs autonomously attack
🧠 Common cause...
Christina Schaireiter
Why Attend a Developer Event?Modern software engineering moves too fast for documentation alone. Attending a world-class event is about shifting from tactical execution to strategic leadership.
Skill Diversification: Break out of your specific tech stack to see how the industry...
Daniel Cranney
Dev Digest 214: Claude Is Leaking, GitHub Is Listening & Axios Hacked!Inside last week’s Dev Digest 214 .
🕵️ Claude source code leaked, analysed and re-written in 2 days
🐙 GitHub auto-opts users into feeding their code to train their AI
🌐 Pretext shows how to show complex text rendering in the browser
🤖 How to securin...
From learning to earning
Jobs that call for the skills explored in this talk.