Skip to content

DevOps

Using LLM Agents to Automate Operational Tasks in Kubernetes

with Shramish Kafle

Friday 10 July 16:20 – 16:50 Stage 9

About This Session

Kubernetes platforms generate a continuous stream of operational tasks: log triage, event correlation, configuration drift detection, rollout validation, and failure remediation. These activities consume significant SRE and platform engineering time and require deep domain knowledge. This session shows how LLM driven agents can automate a substantial part of this workload by combining GitOps, policy evaluation, and cluster telemetry with language model reasoning. The talk presents a practical, production aligned architecture for integrating LLM agents directly into the Kubernetes control loop. It explains how agents interpret cluster states, evaluate anomalies, propose corrective actions, and submit changes into GitOps pipelines with auditability and guardrails. Real examples include misconfiguration detection, explaining failing rollouts, guiding developers with contextual feedback, and performing controlled rollback or fix generation. A live demo walks through a real failure scenario to illustrate how an agent analyzes signals, identifies the root cause, and produces a precise remediation plan. Emphasis is placed on reliability, validation stages, failure handling, observability hooks, and the boundaries of what should and should not be automated. Attendees leave with a repeatable blueprint for safely applying LLM automation to reduce toil while maintaining predictability and full change transparency.

Topics

  • Automation
  • CI/CD
  • DevOps
  • GitOps
  • Multi-Cloud