How GitHub Actually Migrated 100 Million Repos to a New Storage Engine
The CTO Podcast with Fexingo · 2026-06-21 · 9 min
Episode notes
In 2024, GitHub faced an impossible problem: its 15-year-old storage backend, built on bare Git repositories, couldn't keep up with 100 million active repos, AI-generated commits, and terabyte-scale monorepos. This episode drills into how GitHub's engineering team designed and rolled out a custom storage engine called 'GitHub Storage Service' (GSS) without a single user-facing outage. We cover the fundamental shift from POSIX filesystem assumptions to object-storage-native Git, the two-year phased migration that touched every push and clone request, and the surprising performance win: 40% faster clone times for large repos. Lucas and Luna also discuss the trade-off between backward compatibility and architectural purity - and why GitHub chose to keep the Git protocol unchanged even as they ripped out the entire storage layer underneath. #GitHub #StorageEngine #Git #Infrastructure #Migration #ObjectStorage #Engineering #CTO #TechnicalLeadership #Architecture #Scalability #Monorepo #AI #BackwardCompatibility #Performance #Business #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
More from The CTO Podcast with Fexingo
All episodes →- How Airbnb Rebuilt Search for 8 Million Listings42 / 100
- How GitLab Built a Single Codebase for One Million CI Pipelines45 / 100
- How Slack Rebuilt Its Search Index for 10 Million Daily Queries37 / 100
- How Notion Rebuilt Its Sync Engine for Offline-First
- How Notion Rebuilt Its Block Engine for Hybrid Local-Sync