Designing UX for SRE Agents in High-Stakes Incidents

About This Session

Incident analysis isn't a straight-line calculation, it's a maze. Every alert opens a fork: deploy regression, dependency flap, or the first step of something larger. Older LLMs stumbled at the first fork. Newer models navigate these branches and backtrack when a hypothesis doesn't hold. At Hyground, we don't hand our SRE agents exhaustive runbooks. We give them a foundation in operations work and pointers to metrics, logs, and wikis, then let them run. That forces us to rethink what UX means. When the trajectory isn't one you can wireframe in advance, what are you actually designing? Engineers want control, most of all at 3am with a pager going off. How do you surface an agent's reasoning without burying the operator in text? How do you give the human steering authority over a process whose next step doesn't exist yet? We will walk through the interface patterns we landed on in Hyground, the ones we discarded, and close with a question: when the thing on the other side of the screen is an intelligence of its own, is "user interface" still the right word?