How Kubernetes QOS Classes Cause OOM Kills in Production
DevOps Daily with Fexingo: CI/CD, Kubernetes, and Modern Software Operations · 2026-06-16 · 8 min
Episode notes
Episode 55 of DevOps Daily dives into Kubernetes Quality of Service (QOS) classes — Guaranteed, Burstable, and BestEffort — and how they determine which pods get killed when memory runs out. Lucas explains why a Burstable pod with a 1GB limit but a 500MB request is more likely to be OOM-killed than a BestEffort pod using 100MB, using a real-world case from a fintech startup that lost payment-processing pods during a memory spike. Luna shares a counterintuitive example from her own Kubernetes cluster where setting requests equal to limits actually reduced total utilization. The hosts discuss the trade-offs between resource efficiency and reliability, and why many teams blindly set requests too low. The episode also covers practical strategies: using Vertical Pod Autoscaler to tune requests, setting memory limits with a 20% headroom buffer, and monitoring the oom_kill counter in Prometheus. A must-listen for any engineer running stateful workloads on Kubernetes.