When an object is first created (by git add or git commit), it lives
as a separate file, a loose object: .git/objects/8d/0e41.... That
works well for writes, but storing millions of individual files is
inefficient. Git periodically consolidates them into packfiles.
Inside a packfile
Two files appear in .git/objects/pack/:
pack-abc123def456.pack # compressed objects
pack-abc123def456.idx # SHA index for fast lookup
Inside the .pack file:
- objects are stored one after another, each zlib-compressed;
- for similar objects, Git computes a delta and stores only the difference relative to a base object;
- the whole stream is compressed again at the container level.
Delta compression
This is where most of Git's space savings come from. Suppose you have two versions of a large file that differ by five lines. In loose format, those are two blobs, each nearly the full size of the file. In a packfile, one version is stored in full and the other as "take that one and apply these edits."
The algorithm resembles xdelta/bsdiff. Git selects a base not by filename but by heuristic: it looks for a similar object of the same type and comparable size. Deltas can therefore exist between two completely unrelated files if they happen to be similar.
When packfiles are created
- On
git gc(manual or automatic). - On
git pushorgit fetch: the two sides exchange objects in packfile format, not one object at a time. - On
git clone: the server sends the entire repository as a packfile.
Auto-gc triggers when certain thresholds are exceeded:
- more than 6700 loose objects;
- more than 50 packfiles.
Both thresholds are configurable: gc.auto, gc.autoPackLimit.
Reading objects from a packfile
Git commands work the same way regardless of whether an object is loose or
packed. git cat-file -p <sha> finds the object in either form.
To inspect a pack file directly:
git verify-pack -v .git/objects/pack/pack-abc.idx
# SHA type size packfile-offset base-SHA?
The output shows which objects are bases and which are deltas, along with each delta's dependency chain.
Pitfalls
- Delta compression does not mean Git "stores diffs." The logical data model is still snapshot-based (commit holds tree, not a diff). Deltas are a physical storage detail.
- With a very large packfile, a single read can be slow: Git must
decompress a chain of deltas. The
pack.deltaCacheSizeoption limits that chain length. git gc --aggressiverecomputes all deltas from scratch to find better bases. It takes a long time but produces a smaller pack. Worth running once every few months on large repositories.