Kubernetes Shake-ups, Platform Reality, and AI-Native SRE
Ship It Weekly · 2025-11-21 · 16 min
Episode notes
In this episode of Ship It Weekly , Brian digs into 3 big themes for anyone running Kubernetes or building internal platforms. First, Kubernetes is officially retiring Ingress NGINX and moving it into best-effort maintenance until March 2026. We talk about what that actually means if you’re still using it and how to think about choosing and rolling out a replacement ingress. Second, we look at how CNCF is defining platform engineering and what “platform as a product” looks like in practice, plus some hard-earned lessons from running Kubernetes in production. Third, we talk about AI as a first-class workload on Kubernetes. CNCF’s new Certified Kubernetes AI Conformance Program aims to standardize how AI runs on K8s, and recent writing on SRE in the age of AI looks at what reliability means when systems learn and drift. In the lightning round, we hit good reads on database migrations, Postgres upgrades, and a distributed priority queue on Kafka. We wrap with the human side of incidents: fixation during incident response and using incidents as landmarks for the tradeoffs you’ve been making over time.
More from Ship It Weekly
All episodes →- containerd CRI Vulnerabilities, Datadog PostgreSQL HA on Kubernetes, AWS DevOps Agent with Datadog MCP Server, EKS Control Plane Egress, and Why Users Feel the Wait50 / 100
- Ship It Conversations: Guardsquare’s Joel DeStefano on Mobile App Security, Runtime Protection, App Hardening, and Why Scanning Isn’t Enough35 / 100
- PeopleSoft Zero-Day Exploited, npm v12 Install Script Changes, GitHub Agentic Tokens, Anthropic Model Risk, and Default Trust Breaking28 / 100
- Ship It Conversations: Meta’s Francois Richard on AI Incident Response, SLOs, and Reliability at Scale
- Coinbase Outage, Meta AI Account Recovery, AWS AgentCore Code Injection, Apigee Tenant Isolation, and the Glue That Breaks Production