How Datadog Monitors Its Own 100-Terabyte Infrastructure
The CTO Podcast with Fexingo · 2026-06-16 · 10 min
Episode notes
Episode 54 of The CTO Podcast: Lucas and Luna explore how Datadog, the monitoring giant, uses its own tools to manage a sprawling infrastructure that ingests over 100 terabytes of data daily. They dive into the dogfooding strategy, the architectural choices that keep observability scalable, and the surprising insight that Datadog runs its entire backend on a single PostgreSQL fork - with custom sharding. Lucas explains the engineering org structure behind the monitoring team, and Luna questions whether dogfooding can blind teams to customer pain. Specific examples include how Datadog handles metric cardinality explosion and why they built a separate time-series database internally before launching it as a product. #Datadog #Observability #Dogfooding #TechLeadership #Infrastructure #PostgreSQL #Scalability #TimeSeriesDatabase #EngineeringCulture #Monitoring #CTOPodcast #FexingoBusiness #BusinessPodcast #Architecture #Sharding #MetricCardinality #SRE #CloudNative Keep every episode free: buymeacoffee.com/fexingo
More from The CTO Podcast with Fexingo
All episodes →- How Airbnb Rebuilt Search for 8 Million Listings42 / 100
- How GitLab Built a Single Codebase for One Million CI Pipelines45 / 100
- How Slack Rebuilt Its Search Index for 10 Million Daily Queries37 / 100
- How Notion Rebuilt Its Sync Engine for Offline-First
- How Notion Rebuilt Its Block Engine for Hybrid Local-Sync