How do you provide powerful monitoring for thousands of services without the toil? Learn how Google built a zero-configuration dashboard system that just works.
#1about 3 minutes
The challenge of creating monitoring dashboards from scratch
Monitoring is often an afterthought, leading to painful incident response without the necessary dashboards for troubleshooting.
Google's massive scale, global distribution, and monorepo architecture created a unique need for a scalable, reusable monitoring solution.
#3about 5 minutes
Building reusable dashboards with templated dimensions
Replace hardcoded values in queries with template variables, called dimensions, to create a single dashboard that can be reused for any service.
#4about 6 minutes
Solving dashboard discovery with scopes and traits
Address the problem of too many dashboards by having users select a "scope" (e.g., a service), which then uses discovered "traits" to show only relevant dashboards.
#5about 2 minutes
Modeling different entities with scope types
Introduce "scope types" to create namespaces for different kinds of monitorable entities, such as servers, databases, or machine learning models.
#6about 4 minutes
Why infrastructure as code is not the right solution
Static provisioning with infrastructure-as-code or dashboards-as-code is insufficient because it lacks dynamic runtime information and creates a stale second source of truth.
#7about 3 minutes
Improving performance at scale with query variants
Use pre-aggregated metrics and define multiple query "variants" within a graph, allowing the system to automatically select the most performant query based on the user's drill-down level.
#8about 1 minute
Visualizing dependencies with a service graph
Leverage the scope and dependency information to build a service graph that helps engineers quickly navigate between related systems during an incident.
#9about 1 minute
Key takeaways for building planet-scale dashboards
A summary of the core principles: use dimensions for reusability, traits for discovery, scope types for genericity, and variants for performance.
Related jobs
Jobs that call for the skills explored in this talk.
Dev Digest 128 - Do not Google MonopolyHello fellow developer, who watches the watchmen and what is a monopoly? Well, let's find out and learn a few things about new web features and accessibility along the way.News and ArticlesIt is official that Google has monopolised search through ill...
Christina Schaireiter
Why Attend a Developer Event in 2026?Modern software engineering moves too fast for documentation alone. Attending a world-class developer event is about shifting from tactical execution to strategic leadership — and in 2026, the opportunity to do that on US soil has never been stronger...