← DevOps Daily with Fexingo: CI/CD, Kubernetes, and Modern Software Operations

How Kubernetes Pod Autoscaling Fails Under Traffic Spikes

DevOps Daily with Fexingo: CI/CD, Kubernetes, and Modern Software Operations · 2026-05-30 · 11 min

Episode notes

In this episode, Lucas and Luna dig into the mechanics of Kubernetes Horizontal Pod Autoscaler — specifically why it often fails to keep up with sudden traffic spikes. They walk through a real-world scenario from a retail platform that saw request latency spike from under 100ms to over 2 seconds during a flash sale. The root cause wasn't resource limits or cluster size — it was the default HPA scaling metrics and cooldown windows. Lucas explains how target CPU utilization, stabilization windows, and custom metrics interact, and why relying solely on CPU-based HPA leaves you vulnerable. They discuss the alternative: using Kubernetes Event-driven Autoscaling (KEDA) with request-based metrics. If you're running Kubernetes in production and haven't stress-tested your HPA configuration, this episode will save you from a late-night incident. #Kubernetes #PodAutoscaling #HPA #KEDA #SiteReliabilityEngineering #CloudNative #DevOps #IncidentResponse #Scalability #Microservices #Containers #ProductionEngineering #TrafficSpikes #Metrics #Technology #FexingoBusiness #BusinessPodcast #DevOpsDaily Keep every episode free: buymeacoffee.com/fexingo

All DevOps Daily with Fexingo: CI/CD, Kubernetes, and Modern Software Operations episodes →