Jonathan Creamer

How a Small Team Shrank a Microsoft Monorepo by 94%

A Microsoft monorepo ballooned to 150GB. The culprit wasn't code, but a Git hashing bug from 2006 that saw different files as identical.

How a Small Team Shrank a Microsoft Monorepo by 94%
#1about 2 minutes

The scale of Microsoft's monorepo problem

A monorepo with 20 million lines of code grew from a manageable 2GB to an unworkable 150GB, prompting an investigation into its exponential growth.

#2about 3 minutes

How automated changelog tooling bloated the repository

The versioning tool Beach Ball generated thousands of changelog files, causing a separate versioning branch to swell to an enormous 130GB.

#3about 9 minutes

Discovering a Git hashing algorithm bug from 2006

A Git expert found that an old hashing algorithm only used the last 16 characters of a filename, causing collisions that prevented proper diffing of changelog files.

#4about 4 minutes

Implementing the new path walk algorithm to fix Git

The solution was a new "Path Walk" algorithm for `git push` and `git repack` that uses the full file path to avoid hash collisions and ensure correct diffing.

#5about 2 minutes

Applying the fix with new Git config and repack commands

Developers can enable the new algorithm for pushes via a `git config` setting and shrink local clones using the `git repack --use-path-walk` command.

#6about 2 minutes

Using the new `git survey` command to find large files

A new built-in command, `git survey`, was created to help developers identify large files, blobs, and binaries in their repository history.

#7about 3 minutes

Best practices for managing large repositories

Beyond the specific fix, general best practices like not checking in binaries and avoiding thousands of files in a single folder are crucial for repository health.

#8about 6 minutes

The broader impact on the open source community

The new algorithm has shown significant size reductions for other large monorepos like Chromium, and the fix is being upstreamed to benefit the entire Git community.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

From learning to earning

Jobs that call for the skills explored in this talk.

DevOps Jenkins

GitLab
Barcelona, Spain

40-60K
Java
.NET
Docker
Angular
+3

Azure DevOps


Municipality of Madrid, Spain

50-70K
Amazon Web Services (AWS)

Azure DevOps 100% remoto

SlashMobility
Municipality of Madrid, Spain

Azure
Docker
Continuous Integration
Amazon Web Services (AWS)

Azure DevOps


Municipality of Madrid, Spain

110-125K
Amazon Web Services (AWS)