DevOps

Using LLM Agents to Automate Operational Tasks in Kubernetes

with Shramish Kafle

Friday 10 July 13:00 – 13:30 Stage 9

About This Session

Kubernetes platforms generate a continuous stream of operational tasks: log triage, event correlation, configuration drift detection, rollout validation, and failure remediation. These activities consume significant SRE and platform engineering time and require deep domain knowledge. This session shows how LLM driven agents can automate a substantial part of this workload by combining GitOps, policy evaluation, and cluster telemetry with language model reasoning. The talk presents a practical, production aligned architecture for integrating LLM agents directly into the Kubernetes control loop. It explains how agents interpret cluster states, evaluate anomalies, propose corrective actions, and submit changes into GitOps pipelines with auditability and guardrails. Real examples include misconfiguration detection, explaining failing rollouts, guiding developers with contextual feedback, and performing controlled rollback or fix generation. A live demo walks through a real failure scenario to illustrate how an agent analyzes signals, identifies the root cause, and produces a precise remediation plan. Emphasis is placed on reliability, validation stages, failure handling, observability hooks, and the boundaries of what should and should not be automated. Attendees leave with a repeatable blueprint for safely applying LLM automation to reduce toil while maintaining predictability and full change transparency.

Topics

Automation
CI/CD
DevOps
GitOps
Multi-Cloud

← Back to Schedule

Speaker

Shramish Kafle

Senior Solutions Architect · KFW Bank

Senior Solutions Architect at KFW Bank

Read bio Hide bio

Shramish is a Senior Solutions Architect with deep experience in cloud native platforms, Kubernetes operations, Observability, and modern delivery practices. His work centers on designing scalable architecture patterns, improving deployment reliability, and guiding engineering teams toward maintainable, production grade solutions. He writes regularly on Medium and hosts a podcast focused on DevOps, SRE, machine learning, and the practical use of LLMs in engineering workflows. He brings a strong foundation from earlier roles, including DevOps engineering at KSB and as a cloud consultant at MHP: A Porsche Company. During his time there, he worked closely with teams at Porsche and Volkswagen on large scale DevOps initiatives, CI and CD pipeline design, cloud transformation programs and migration strategies for complex enterprise systems. These experiences shaped his practical, implementation driven approach to platform and architecture work. He has delivered technical sessions and hands on workshops with Amazon Web Services and Technische Universität Berlin, covering topics such as Kubernetes operations, cloud architecture choices and platform reliability. His speaking style focuses on real lessons, clear explanations and practical insights that engineering teams can apply immediately. His work across industry verticals has given him a grounded perspective on what makes platforms reliable, scalable and efficient to operate. Attendees can expect a session rooted in real experience, thoughtful engineering practice and an honest look at what actually works in modern cloud environments.