Keycloak case study: Making users happy with service level indicators and observability

What if you could click a single point on a dashboard and instantly see the trace for that slow request? Learn how to connect metrics to traces with exemplars.

#1about 2 minutes

The critical role of single sign-on systems

Single sign-on systems are a critical dependency for all enterprise applications and users, requiring high availability and performance, especially during usage spikes.

#2about 3 minutes

An overview of Keycloak for identity management

Keycloak is a mature, fully open-source CNCF incubating project for identity and access management that handles authentication, user registration, and more.

#3about 5 minutes

Measuring user happiness with service level objectives

User happiness can be quantified by measuring key metrics like availability, error rate, and latency against defined service level objectives (SLOs).

#4about 2 minutes

Measuring availability and error rates with Prometheus

Prometheus can measure service availability using its built-in 'up' metric and calculate error rates by analyzing HTTP request status codes with PromQL.

#5about 2 minutes

Using Prometheus histograms to measure request latency

Enabling histograms for metrics allows you to categorize requests into performance buckets, making it possible to measure latency against specific SLOs.

#6about 1 minute

Visualizing key metrics with the Keycloak dashboard

Keycloak provides a pre-built Grafana dashboard to visualize availability, error rates, and response times for at-a-glance monitoring of service health.

#7about 4 minutes

Finding root causes with distributed tracing

Distributed tracing with OpenTelemetry provides detailed, request-level insights into performance bottlenecks and errors that high-level metrics cannot reveal.

#8about 2 minutes

How Keycloak adds business context to traces

Keycloak enhances traces by adding business-specific information like client ID, realm name, and user session to simplify searching and debugging.

#9about 2 minutes

Connecting metrics to traces using exemplars

Exemplars link specific traces to your metrics, allowing you to jump directly from a slow request in a histogram to its detailed trace for analysis.

#10about 2 minutes

Using exemplars in Grafana for targeted analysis

Heatmaps in Grafana with exemplars enabled allow you to click on an outlier data point and immediately investigate the corresponding trace for that request.

#11about 3 minutes

Using observability for better business outcomes

A robust observability strategy helps track user-centric metrics, chase tail latencies, and make data-driven decisions about infrastructure and feature development.

Alexander Schwartz