How Stripe Migrated Payment Routing to 99.999% Uptime
The CTO Podcast with Fexingo · 2026-06-16 · 9 min
Episode notes
Episode 55 of The CTO Podcast dives into how Stripe rebuilt its payment routing engine to achieve 99.999% uptime. Lucas and Luna break down the architectural shift from a monolithic routing layer to a distributed, deterministic system that handles millions of transactions per second. They explore the team's decision to move away from traditional load balancers, the role of formal verification in routing logic, and how Stripe's engineers stress-tested the system with simulated global outages. Along the way, they discuss the trade-offs between latency and consistency, and why a gradual canary deployment was critical. This episode offers concrete lessons for engineering leaders designing fault-tolerant systems at scale. #Stripe #PaymentRouting #99.999PercentUptime #DistributedSystems #Architecture #FaultTolerance #FormalVerification #CanaryDeployment #LatencyConsistencyTradeoff #PaymentProcessing #EngineeringLeadership #SystemDesign #HighAvailability #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #CTOPodcast #TechLeadership Keep every episode free: buymeacoffee.com/fexingo
More from The CTO Podcast with Fexingo
All episodes →- How Airbnb Rebuilt Search for 8 Million Listings62 / 100
- How GitLab Built a Single Codebase for One Million CI Pipelines65 / 100
- How Slack Rebuilt Its Search Index for 10 Million Daily Queries57 / 100
- How Notion Rebuilt Its Sync Engine for Offline-First
- How Notion Rebuilt Its Block Engine for Hybrid Local-Sync