
Alexander Schwartz
Aug 20, 2025
Keycloak case study: Making users happy with service level indicators and observability

#1about 2 minutes
The critical role of single sign-on systems
Single sign-on systems are a critical dependency for all enterprise applications and users, requiring high availability and performance, especially during usage spikes.
#2about 3 minutes
An overview of Keycloak for identity management
Keycloak is a mature, fully open-source CNCF incubating project for identity and access management that handles authentication, user registration, and more.
#3about 5 minutes
Measuring user happiness with service level objectives
User happiness can be quantified by measuring key metrics like availability, error rate, and latency against defined service level objectives (SLOs).
#4about 2 minutes
Measuring availability and error rates with Prometheus
Prometheus can measure service availability using its built-in 'up' metric and calculate error rates by analyzing HTTP request status codes with PromQL.
#5about 2 minutes
Using Prometheus histograms to measure request latency
Enabling histograms for metrics allows you to categorize requests into performance buckets, making it possible to measure latency against specific SLOs.
#6about 1 minute
Visualizing key metrics with the Keycloak dashboard
Keycloak provides a pre-built Grafana dashboard to visualize availability, error rates, and response times for at-a-glance monitoring of service health.
#7about 4 minutes
Finding root causes with distributed tracing
Distributed tracing with OpenTelemetry provides detailed, request-level insights into performance bottlenecks and errors that high-level metrics cannot reveal.
#8about 2 minutes
How Keycloak adds business context to traces
Keycloak enhances traces by adding business-specific information like client ID, realm name, and user session to simplify searching and debugging.
#9about 2 minutes
Connecting metrics to traces using exemplars
Exemplars link specific traces to your metrics, allowing you to jump directly from a slow request in a histogram to its detailed trace for analysis.
#10about 2 minutes
Using exemplars in Grafana for targeted analysis
Heatmaps in Grafana with exemplars enabled allow you to click on an outlier data point and immediately investigate the corresponding trace for that request.
#11about 3 minutes
Using observability for better business outcomes
A robust observability strategy helps track user-centric metrics, chase tail latencies, and make data-driven decisions about infrastructure and feature development.
Related jobs
Jobs that call for the skills explored in this talk.
today
Java / Kotlin Developer in einem Cloud-Native-Stack

PROSOZ Herten GmbH
Herten, Germany
Intermediate
Senior
today
Senior Softwareentwickler (m/w/d)

PROSOZ Herten GmbH
Herten, Germany
Remote
Intermediate
Senior
yesterday
Principal Backend Engineer (Node.js)

Almedia
Berlin, Germany
Senior