Jonathan Creamer

How a Small Team Shrank a Microsoft Monorepo by 94%

A Microsoft monorepo ballooned to 150GB. The culprit wasn't code, but a Git hashing bug from 2006 that saw different files as identical.

How a Small Team Shrank a Microsoft Monorepo by 94%
#1about 2 minutes

The scale of Microsoft's monorepo problem

A monorepo with 20 million lines of code grew from a manageable 2GB to an unworkable 150GB, prompting an investigation into its exponential growth.

#2about 3 minutes

How automated changelog tooling bloated the repository

The versioning tool Beach Ball generated thousands of changelog files, causing a separate versioning branch to swell to an enormous 130GB.

#3about 9 minutes

Discovering a Git hashing algorithm bug from 2006

A Git expert found that an old hashing algorithm only used the last 16 characters of a filename, causing collisions that prevented proper diffing of changelog files.

#4about 4 minutes

Implementing the new path walk algorithm to fix Git

The solution was a new "Path Walk" algorithm for `git push` and `git repack` that uses the full file path to avoid hash collisions and ensure correct diffing.

#5about 2 minutes

Applying the fix with new Git config and repack commands

Developers can enable the new algorithm for pushes via a `git config` setting and shrink local clones using the `git repack --use-path-walk` command.

#6about 2 minutes

Using the new `git survey` command to find large files

A new built-in command, `git survey`, was created to help developers identify large files, blobs, and binaries in their repository history.

#7about 3 minutes

Best practices for managing large repositories

Beyond the specific fix, general best practices like not checking in binaries and avoiding thousands of files in a single folder are crucial for repository health.

#8about 6 minutes

The broader impact on the open source community

The new algorithm has shown significant size reductions for other large monorepos like Chromium, and the fix is being upstreamed to benefit the entire Git community.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

From learning to earning

Jobs that call for the skills explored in this talk.

Rust and GoLang

Rust and GoLang

NHe4a GmbH
Karlsruhe, Germany

Remote
55-65K
Intermediate
Senior
Go
Rust