Special: When the Cloud Has a Bad Day: Cloudflare, AWS us-east-1 & GitHub Outages
Ship It Weekly · 2025-11-20 · 13 min
Episode notes
In this special kickoff episode of Ship It Weekly , Brian walks through three major outages from the last few weeks and what they actually mean for DevOps, SRE, and platform teams. Instead of just reading status pages, we look at how each incident exposes assumptions in our own architectures and runbooks: Topics in this episode: • Cloudflare’s global outage and what happens when your CDN/WAF becomes a single point of failure • The AWS us-east-1 incident and why “multi-AZ in one region” isn’t a full disaster recovery strategy • GitHub’s Git operations / Codespaces outage and how fragile our CI/CD and GitOps flows can be • Practical questions to ask about your own setup: CDN bypass, cross-region readiness, backups for Git and CI This episode is more of a themed “special” to kick things off. Going forward, most episodes will follow a lighter news format: a couple of main stories from the week in DevOps/SRE/platform engineering, a quick tools and releases segment, and one culture/on-call or burnout topic. Specials like this will pop up when there’s a big incident or theme worth unpacking.
More from Ship It Weekly
All episodes →- containerd CRI Vulnerabilities, Datadog PostgreSQL HA on Kubernetes, AWS DevOps Agent with Datadog MCP Server, EKS Control Plane Egress, and Why Users Feel the Wait50 / 100
- Ship It Conversations: Guardsquare’s Joel DeStefano on Mobile App Security, Runtime Protection, App Hardening, and Why Scanning Isn’t Enough35 / 100
- PeopleSoft Zero-Day Exploited, npm v12 Install Script Changes, GitHub Agentic Tokens, Anthropic Model Risk, and Default Trust Breaking28 / 100
- Ship It Conversations: Meta’s Francois Richard on AI Incident Response, SLOs, and Reliability at Scale
- Coinbase Outage, Meta AI Account Recovery, AWS AgentCore Code Injection, Apigee Tenant Isolation, and the Glue That Breaks Production