Jonathan Creamer

How a Small Team Shrank a Microsoft Monorepo by 94%

A Microsoft monorepo ballooned to 150GB. The culprit wasn't code, but a Git hashing bug from 2006 that saw different files as identical.

How a Small Team Shrank a Microsoft Monorepo by 94%
#1about 2 minutes

The scale of Microsoft's monorepo problem

A monorepo with 20 million lines of code grew from a manageable 2GB to an unworkable 150GB, prompting an investigation into its exponential growth.

#2about 3 minutes

How automated changelog tooling bloated the repository

The versioning tool Beach Ball generated thousands of changelog files, causing a separate versioning branch to swell to an enormous 130GB.

#3about 9 minutes

Discovering a Git hashing algorithm bug from 2006

A Git expert found that an old hashing algorithm only used the last 16 characters of a filename, causing collisions that prevented proper diffing of changelog files.

#4about 4 minutes

Implementing the new path walk algorithm to fix Git

The solution was a new "Path Walk" algorithm for `git push` and `git repack` that uses the full file path to avoid hash collisions and ensure correct diffing.

#5about 2 minutes

Applying the fix with new Git config and repack commands

Developers can enable the new algorithm for pushes via a `git config` setting and shrink local clones using the `git repack --use-path-walk` command.

#6about 2 minutes

Using the new `git survey` command to find large files

A new built-in command, `git survey`, was created to help developers identify large files, blobs, and binaries in their repository history.

#7about 3 minutes

Best practices for managing large repositories

Beyond the specific fix, general best practices like not checking in binaries and avoiding thousands of files in a single folder are crucial for repository health.

#8about 6 minutes

The broader impact on the open source community

The new algorithm has shown significant size reductions for other large monorepos like Chromium, and the fix is being upstreamed to benefit the entire Git community.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
DC
Daniel Cranney
5 GitHub Repos That Help You Ship and Show Your Work
If you’re a regular reader here, you’ll know we absolutely love open source. In particular, we have a soft spot for the small, practical repos, the ones that don’t look like much at first glance, but turn out to be incredibly useful. We’ve rounded up...
5 GitHub Repos That Help You Ship and Show Your Work
Dev Digest 108 - Git off my cloud!
Welcome to another edition of the WeAreDevelopers Dev Digest. This time we have an interview with Sead Ahmetovic, CEO of of WeAreDevelopers and Scott Chacon, co-Founder of GitHub. They talk about careers, early coding days, developer communities, eva...
Dev Digest 108 - Git off my cloud!

From learning to earning

Jobs that call for the skills explored in this talk.

Software Engineer

Code Healers LLC
Hinesville, United States of America

Remote
40-50K
Intermediate
Senior
PHP
.NET
React
+2