Skip to content

DevOps

Beyond Chat: AI Workflows That Actually Investigate Alerts (So You Don't Have To Know Everything)

with Nune Isabekyan & Aram Hakobyan

Friday 10 July 16:20 – 16:50 Stage 12

About This Session

"You build it, you run it" sounds great until you're on-call for a Kafka consumer lag spike at 3 AM—and you spent the last six months building the React frontend, not the event pipeline. Modern teams own their services end-to-end, but no one can be an expert in everything. And the AI chatbots we've been promised? They just add another window to alt-tab through while the pager screams. This talk argues that chat-based AI is fundamentally wrong for incident investigation. I'll break down why the chat paradigm fails: it expects you to provide context you don't have at 3 AM, assumes you know where to look, burns time with back-and-forth, and interrogates instead of investigates. Then I'll show what actually works: AI workflows that investigate like a teammate who knows the system—automatically discovering affected resources, correlating metrics with deployments, querying the right logs without being asked, and delivering hypotheses with evidence. We'll cover real engineering challenges: orchestrating tools across Kubernetes, logs, and metrics; solving the "where do I even start?" problem; building outputs that explain unfamiliar systems; and what breaks when you let AI loose on production. Live Demo: A simulated 3 AM alert comparing the chatbot experience ("Can you tell me more about your cluster configuration?") versus a workflow that delivers: "Your deployment rolled out 47 minutes ago with a memory limit reduction. Three pods are OOMKilling. Here's the diff and the kubectl command to rollback."

Topics

  • Automation
  • DevOps
  • Generative AI (GenAI)
  • LLMOps
  • Site Reliability Engineering (SRE)
  • Workflow Automation