How Kubernetes Service Mesh Istio Sidecars Cache Memory Until OOM
DevOps Daily with Fexingo: CI/CD, Kubernetes, and Modern Software Operations · 2026-06-24 · 6 min
Substance score
54 / 100
Five dimensions, 20 points each
What our scoring noted
Our reviewer’s read on each dimension, with quotes from the episode.
Insight Density
For a 6-minute episode the information density is genuinely high — tcmalloc heap retention, xDS full-state repush, the overload manager, and Sidecar resource scoping are all actionable specifics rather than hand-waving. The episode loses points because the overall arc (profile first, scope your mesh, don't trust defaults) is familiar to any senior SRE, even if the individual mechanics are useful.
Envoy's memory allocator — tcmalloc — doesn't always return freed pages to the OS. It holds onto them as a heap reserve.
The metric you want is 'server.memory_heap_size' and 'server.memory_total_alloc_bytes'. If the latter is hitting 80% of your limit, you're in danger.
Originality
The tcmalloc RSS-drift mechanism and the Sidecar resource being 'underused' are genuinely underappreciated points, but the broader framing — profile before tuning, scope your mesh — is standard advice dressed up with terminology. No contrarian or first-principles argument is made; the episode confirms what informed practitioners already suspect rather than challenging assumptions.
if you see a sidecar using 1.8 GiB steady state, don't just raise the limit to 3 GiB — that's a band-aid
The Sidecar resource is underused
Guest Caliber
There is no external guest — two hosts conduct a structured dialogue. Their technical familiarity is evident but their credentials, seniority, or scale of real-world experience are never established; the episode could be two informed generalists scripting a tutorial rather than practitioners who debugged this at scale.
I've seen the aftermath — pods in CrashLoopBackOff with OOMKill in the events.
this show stays ad-free because listeners like you support it directly. If today's conversation gives you something useful, the link is buy me a coffee dot com slash fexingo.
Specificity & Evidence
The episode cites specific version numbers (Istio 1.20), exact port (15000), exact metric names, and a concrete percentage threshold (80%), which is well above average for a short-form show. However, all examples are illustrative hypotheticals ('say, 500 services,' 'a typical ecommerce microservice') and no real production data, named clusters, or actual incident postmortems are offered.
the default 2 GiB limit in Istio 1.20 and later
Envoy exposes an admin endpoint — port 15000 — with a /stats?format=prometheus
Conversational Craft
The dialogue is cooperative rather than probing — Luna consistently feeds setup questions that Lucas answers without any real challenge or disagreement, making it feel scripted. A mid-episode ad-read interrupts the technical flow and the host never pushes on unverified claims (e.g., the 80% threshold rule is asserted without basis and goes unchallenged).
I've heard of teams setting memory limits to 512 MiB for sidecars and still seeing OOMs. What's the right number?
it's worth mentioning that this show stays ad-free because listeners like you support it directly
Conversation analysis
Computed from the transcript - who did the talking, and the verbal tics along the way.
Filler words
Episode notes
In this episode of DevOps Daily with Fexingo, Lucas and Luna dive into a common but often overlooked problem in Istio service meshes: sidecar Envoy proxies that cache DNS and configuration data until they exhaust pod memory and trigger OOM kills. They break down why the default 2 GiB memory limit for Envoy in Istio 1.20 is not a safety net but a ticking time bomb, especially when sidecars cache full service entries for large clusters. Using the example of a 500-service cluster running Istio 1.22, they explain how Envoy's memory usage scales linearly with the number of endpoints and how a single misconfigured DestinationRule can cause the sidecar to allocate heap for every possible route. They also cover mitigation strategies: setting fine-grained memory limits per workload, enabling Envoy's memory limit agent, and using Wasm filters to reduce runtime memory consumption. By the end, you'll understand why 'just give it more memory' is the wrong answer and how to profile sidecar memory with Envoy's admin endpoint and /stats before it's too late.
Full transcript
6 minTranscribed and scored by The B2B Podcast Index.
Lucas: Luna, have you ever watched an Istio sidecar quietly eat two gigabytes of memory and then fall over because the node ran out? Luna: I've seen the aftermath — pods in CrashLoopBackOff with OOMKill in the events. But I always assumed the default memory limit was a safety net, not a tripwire. Lucas: That's the thing — the default 2 GiB limit in Istio 1.20 and later is meant to prevent the sidecar from consuming the entire node, but it's set way too high for many workloads. And the real issue is that Envoy caches aggressively — DNS lookups, service entries, clusters, endpoints — and that cache doesn't shrink. Luna: So the sidecar just keeps allocating until it hits the limit and gets killed. But why doesn't it free memory when the cluster changes? Lucas: Because Envoy's memory allocator — tcmalloc — doesn't always return freed pages to the OS. It holds onto them as a heap reserve. So even if you delete a service entry, the memory stays dirty in the process, and the next allocation might reuse it, but the RSS never goes down. Luna: Before we go deeper — and I know this is a bit off the topic — it's worth mentioning that this show stays ad-free because listeners like you support it directly. If today's conversation gives you something useful, the link is buy me a coffee dot com slash fexingo. Lucas: Yeah, and we mean that — no sponsors, no interruptions. It's just us digging into these topics because we genuinely think they matter. So if you find value in that, the support is appreciated. Luna: Alright — back to the sidecar. Let's talk about what actually drives that memory growth. Lucas: Right. In a cluster with, say, 500 services, each with multiple versions and ports, the sidecar ends up storing a full snapshot of every endpoint. That's the 'service mesh abstraction' — every proxy knows about every other proxy. But that knowledge has a cost. Luna: And it's not just service entries. Envoy also caches DNS responses, access logs, and even parts of the control plane's xDS responses. Lucas: Exactly. The Istio control plane pushes a full state update every time anything changes — even a single pod IP. So the sidecar re-parses the entire configuration, allocates new memory, and the old snapshot's memory is never fully reclaimed. Over time, RSS drifts upward. Luna: I've heard of teams setting memory limits to 512 MiB for sidecars and still seeing OOMs. What's the right number? Lucas: There's no universal number. You have to profile. Envoy exposes an admin endpoint — port 15000 — with a /stats?format=prometheus that gives you heap usage, cluster memory, etc. The metric you want is 'server.memory_heap_size' and 'server.memory_total_alloc_bytes'. If the latter is hitting 80% of your limit, you're in danger. Luna: So the answer is to set per-workload limits based on actual usage, not the default. For a high-traffic ingress gateway, 2 GiB might be fine. For a sidecar proxying a low-traffic cron job, 128 MiB might be plenty. Lucas: Yes. And you can also enable Envoy's memory limit agent — it's an experimental feature called 'overload manager' that triggers GC when memory hits a threshold. But it's not perfect. Luna: Another tactic: reduce the size of the configuration pushed to each sidecar. Istio's Sidecar resource lets you limit which services a given proxy knows about. If you scope it to only the services the workload actually talks to, you cut the cache size dramatically. Lucas: That's a big one. A typical ecommerce microservice might only need to reach five backend services. If you push the whole mesh — 500 services — it's wasting memory on endpoints it never calls. The Sidecar resource is underused. Luna: What about Wasm filters? I've heard those can increase memory consumption, but also sometimes reduce it if they replace custom filters? Lucas: Wasm filters are tricky. They run in a separate sandbox per filter, so each adds overhead. But if you're using them to replace a Lua filter that was doing heavy string processing, the Wasm sandbox might actually be more memory-efficient — it's a trade-off. You have to measure. Luna: So the takeaway is: don't trust the defaults. Profile sidecar memory with Envoy's admin endpoint, scope the service mesh with Sidecar resources, and set per-workload limits based on real data. Lucas: Exactly. And if you see a sidecar using 1.8 GiB steady state, don't just raise the limit to 3 GiB — that's a band-aid. Find out why it's caching so much. Is it a misconfigured DestinationRule causing wildcard matching? Are you pushing full endpoint data when subsets would suffice? Luna: Great point. One last thing: there's a known issue where Envoy's connection pool can also bloat if you have high idle timeout settings. Each connection holds a buffer. Lucas: Right. The 'connection_pool_per_upstream_connection' statistic can reveal if you're keeping thousands of idle connections alive. Tune your circuit breakers and idle timeouts. Lower memory usage, fewer OOMs. Luna: Alright, I think we've given everyone a solid debugging checklist. Next time your sidecar goes OOM, you know where to start. Lucas: And as always, if you want to support this kind of deep-dive content without ads, head to buy me a coffee dot com slash fexingo. Until next time.