How Stripe Rebuilt Payment Routing for 99.999% Uptime
The CTO Podcast with Fexingo · 2026-06-15 · 9 min
Episode notes
Stripe's payment infrastructure processes billions of dollars annually, and their routing engine - the system that decides which bank or processor gets each transaction - is a marvel of distributed systems engineering. In this episode, Lucas and Luna explore how Stripe rebuilt its payment routing layer to achieve five-nines uptime, handling failures at the bank level in milliseconds without user impact. They break down the architecture: the state machine that tracks each transaction through six phases, the circuit-breaker pattern that isolates failing processors, and the decision-tree optimization that cut latency by 40 percent. Lucas explains why routing is the hardest problem in payments - more complex than fraud detection or compliance - and how Stripe's design influenced the broader fintech industry. Luna draws parallels to how other critical infrastructure systems, from DNS to CDNs, solve similar reliability problems. A concrete look at what it takes to move money reliably at internet scale.
More from The CTO Podcast with Fexingo
All episodes →- How Airbnb Rebuilt Search for 8 Million Listings62 / 100
- How GitLab Built a Single Codebase for One Million CI Pipelines65 / 100
- How Slack Rebuilt Its Search Index for 10 Million Daily Queries57 / 100
- How Notion Rebuilt Its Sync Engine for Offline-First
- How Notion Rebuilt Its Block Engine for Hybrid Local-Sync