How Pinterest Rebuilt Its Recommendation Engine for 500 Million Users

The CTO Podcast with Fexingo · 2026-06-20 · 14 min

Episode notes

In this episode of The CTO Podcast, Lucas and Luna dive into how Pinterest's engineering team rebuilt its core recommendation engine from a batch-processing pipeline to a real-time, graph-based system serving over 500 million monthly active users. They explore the specific architectural decisions Pinterest made: moving from collaborative filtering to a heterogeneous graph neural network called PinSage, deploying it on TensorFlow Serving with Kubernetes for low-latency inference, and handling the cold-start problem for new pins and users. The discussion covers the trade-offs between offline batch precomputation and online inference, how Pinterest reduced recommendation latency from hours to milliseconds, and the infrastructure costs involved. Lucas explains the graph-based approach that captures user intent through 'pins' and 'boards,' while Luna questions how this impacts content discovery and engagement. Tune in for a detailed look at a real-world scale challenge in modern machine learning systems.

More from The CTO Podcast with Fexingo

All episodes →

Explore the best B2B Engineering & DevTools podcasts →

All The CTO Podcast with Fexingo episodes →