How GitHub Actually Migrated 100 Million Repos to a New Storage Engine

The CTO Podcast with Fexingo · 2026-06-21 · 9 min

Episode notes

In 2024, GitHub faced an impossible problem: its 15-year-old storage backend, built on bare Git repositories, couldn't keep up with 100 million active repos, AI-generated commits, and terabyte-scale monorepos. This episode drills into how GitHub's engineering team designed and rolled out a custom storage engine called 'GitHub Storage Service' (GSS) without a single user-facing outage. We cover the fundamental shift from POSIX filesystem assumptions to object-storage-native Git, the two-year phased migration that touched every push and clone request, and the surprising performance win: 40% faster clone times for large repos. Lucas and Luna also discuss the trade-off between backward compatibility and architectural purity - and why GitHub chose to keep the Git protocol unchanged even as they ripped out the entire storage layer underneath. #GitHub #StorageEngine #Git #Infrastructure #Migration #ObjectStorage #Engineering #CTO #TechnicalLeadership #Architecture #Scalability #Monorepo #AI #BackwardCompatibility #Performance #Business #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

More from The CTO Podcast with Fexingo

All episodes →

Explore the best B2B Engineering & DevTools podcasts →

All The CTO Podcast with Fexingo episodes →