Skip to content

Cloud & AI Infrastructure

The Hidden Costs of CPU Limits in Kubernetes

with Pavel Malyarevsky

Thursday 9 July 14:10 – 14:40 Stage 8 - powered by Red Hat

About This Session

CPU limits in Kubernetes are widely recommended as a best practice—but in real production systems they often introduce performance problems that are hard to observe, explain, or debug. In this talk, we explore what actually happens inside the Linux kernel when CPU limits are enforced via cgroups, and how this impacts modern application runtimes. We’ll examine how CPU throttling interacts with the Linux scheduler, increases context switching, and restricts runnable threads in ways that are invisible at the Kubernetes abstraction layer. A key focus of the session is how different runtimes manage their own concurrency. We’ll show that the number of runnable threads is not always fixed and can often be influenced, sometimes directly, sometimes indirectly, through runtime configuration and scheduling behavior. Using examples from Go, the JVM, and native C/C++ applications, we’ll demonstrate how aligning runtime-level concurrency with actual CPU availability can significantly reduce throttling and improve stability. We’ll connect these mechanisms to real-world symptoms such as increased tail latency, unstable throughput, misleading autoscaling signals, and confusing observability data. The session concludes with practical guidance for DevOps and platform engineers on when CPU limits help, when they hurt, and how to design Kubernetes workloads that balance isolation, performance, and predictability.

Topics

  • DevOps
  • Google Cloud (GCP)
  • Infrastructure
  • Observability
  • Performance