How Pinterest Rebuilt Its Recommendation Engine for 500 Million Users
The CTO Podcast with Fexingo · 2026-06-20 · 14 min
Episode notes
In this episode of The CTO Podcast, Lucas and Luna dive into how Pinterest's engineering team rebuilt its core recommendation engine from a batch-processing pipeline to a real-time, graph-based system serving over 500 million monthly active users. They explore the specific architectural decisions Pinterest made: moving from collaborative filtering to a heterogeneous graph neural network called PinSage, deploying it on TensorFlow Serving with Kubernetes for low-latency inference, and handling the cold-start problem for new pins and users. The discussion covers the trade-offs between offline batch precomputation and online inference, how Pinterest reduced recommendation latency from hours to milliseconds, and the infrastructure costs involved. Lucas explains the graph-based approach that captures user intent through 'pins' and 'boards,' while Luna questions how this impacts content discovery and engagement. Tune in for a detailed look at a real-world scale challenge in modern machine learning systems.
More from The CTO Podcast with Fexingo
All episodes →- How Airbnb Rebuilt Search for 8 Million Listings62 / 100
- How GitLab Built a Single Codebase for One Million CI Pipelines65 / 100
- How Slack Rebuilt Its Search Index for 10 Million Daily Queries57 / 100
- How Notion Rebuilt Its Sync Engine for Offline-First
- How Notion Rebuilt Its Block Engine for Hybrid Local-Sync