Jonathan Creamer
How a Small Team Shrank a Microsoft Monorepo by 94%
#1about 2 minutes
The scale of Microsoft's monorepo problem
A monorepo with 20 million lines of code grew from a manageable 2GB to an unworkable 150GB, prompting an investigation into its exponential growth.
#2about 3 minutes
How automated changelog tooling bloated the repository
The versioning tool Beach Ball generated thousands of changelog files, causing a separate versioning branch to swell to an enormous 130GB.
#3about 9 minutes
Discovering a Git hashing algorithm bug from 2006
A Git expert found that an old hashing algorithm only used the last 16 characters of a filename, causing collisions that prevented proper diffing of changelog files.
#4about 4 minutes
Implementing the new path walk algorithm to fix Git
The solution was a new "Path Walk" algorithm for `git push` and `git repack` that uses the full file path to avoid hash collisions and ensure correct diffing.
#5about 2 minutes
Applying the fix with new Git config and repack commands
Developers can enable the new algorithm for pushes via a `git config` setting and shrink local clones using the `git repack --use-path-walk` command.
#6about 2 minutes
Using the new `git survey` command to find large files
A new built-in command, `git survey`, was created to help developers identify large files, blobs, and binaries in their repository history.
#7about 3 minutes
Best practices for managing large repositories
Beyond the specific fix, general best practices like not checking in binaries and avoiding thousands of files in a single folder are crucial for repository health.
#8about 6 minutes
The broader impact on the open source community
The new algorithm has shown significant size reductions for other large monorepos like Chromium, and the fix is being upstreamed to benefit the entire Git community.
Related jobs
Jobs that call for the skills explored in this talk.
Featured Partners
Related Videos
Coffee With Developers - Kyle Daigle, COO of GitHub
Kyle Daigle
Coffee with Developers - Scott Chacon on growing GitButler and the future of version control
Scott Chacon
Git for Code Reviews
Johannes Haux
Reusing apps between teams and environments through Containers
Adrian Kosmaczewski
Improving Developer Happiness with GitOps
Lars Hesel Christensen & Basil Brunner
Keep your code refactorable
Gerrit Stapper
GitOps: The past, present and future
Roberth Strand
Get ready for operations by pull requests
Liviu Costea
From learning to earning
Jobs that call for the skills explored in this talk.
DevOps Consultant (GitHub)
Cognitive Group
Nottingham, United Kingdom
Remote
€70-75K
Senior
JIRA
Azure
DevOps
+5
Site Reliability / Gitops Engineer
Canonical Ltd.
Municipality of Salamanca, Spain
Ceph
Linux
Python
Routing
Grafana
+6
Azure DevOps 100% remoto
SlashMobility
Municipality of Madrid, Spain
Azure
Docker
Continuous Integration
Amazon Web Services (AWS)
Senior Site Reliability / Gitops Engineer
Canonical Ltd.
Municipality of Salamanca, Spain
C++
Ceph
Linux
Python
MongoDB
+6


