kb/basics ── Git basics ── beginner

Snapshots vs. deltas

The core conceptual difference between Git and SVN/CVS/Perforce. Older VCS store each version as a diff from the previous one. Git stores each version as a complete snapshot of the project, with deduplication of identical files via SHA.

view as markdownaka: snapshot-vs-deltas, delta-vs-snapshot

In systems like SVN, CVS, and Perforce, each version of a file is stored as a delta from the previous one. To retrieve a file from a week ago, the system starts from one of the stored versions and applies deltas in the required direction.

SVN / CVS / Perforce - delta model:

v1: full file

v2: delta (3 lines changed)

v3: delta (1 line added)

v4: delta (5 lines deleted)

↑

            to reconstruct v4,

            apply deltas v1+v2+v3+v4

Git works differently. Each commit stores a complete snapshot of the entire project through a tree object. Not a diff: a photograph.

Git - snapshot model:

commit 1: snapshot 1

commit 2: snapshot 2

commit 3: snapshot 3

↑

            to view commit 3,

            just take snapshot 3

Where the storage savings come from

Storing the whole project with every commit sounds wasteful. It is, and Git uses two techniques to address it.

Content-addressed deduplication. A file is stored through a blob addressed by the SHA of its contents. If the file did not change between commits, it has the same SHA, and Git reuses the same blob. What actually lives on disk is "a snapshot of the directory tree with pointers to files," not "a full copy of the project."

Deltas inside a packfile. During git gc or git push, accumulated objects are packed into a single file where similar objects are encoded as deltas. This is a storage optimization, not the data model.

Why this matters

The difference between the two models is the foundation of how Git works.

A branch is a pointer to a snapshot, not a record of "forked at this moment." Creating a branch costs 40 bytes. Switching is instant.
git log walks the chain of parent commits, reading ready-made snapshots. No history reconstruction needed.
git diff shows the difference between two snapshots, computed on demand. That is a view, not the storage model. This is a common source of confusion.

§ команды

bash

git cat-file -p HEAD^{tree}

Contents of the snapshot for the latest commit

bash

git cat-file -t <sha>

Type of an object (a snapshot is accessed through a tree)

bash

git diff <commit-A> <commit-B>

Diff between two snapshots, computed at the time of the request

§ см. также

kb/basics ── Git basics ── beginner

Snapshots vs. deltas

view as markdownaka: snapshot-vs-deltas, delta-vs-snapshot

SVN / CVS / Perforce - delta model:

v1: full file

v2: delta (3 lines changed)

v3: delta (1 line added)

v4: delta (5 lines deleted)

↑

            to reconstruct v4,

            apply deltas v1+v2+v3+v4

Git works differently. Each commit stores a complete snapshot of the entire project through a tree object. Not a diff: a photograph.

Git - snapshot model:

commit 1: snapshot 1

commit 2: snapshot 2

commit 3: snapshot 3

↑

            to view commit 3,

            just take snapshot 3

Where the storage savings come from

Storing the whole project with every commit sounds wasteful. It is, and Git uses two techniques to address it.

Why this matters

The difference between the two models is the foundation of how Git works.

A branch is a pointer to a snapshot, not a record of "forked at this moment." Creating a branch costs 40 bytes. Switching is instant.
git log walks the chain of parent commits, reading ready-made snapshots. No history reconstruction needed.
git diff shows the difference between two snapshots, computed on demand. That is a view, not the storage model. This is a common source of confusion.

§ команды

bash

git cat-file -p HEAD^{tree}

Contents of the snapshot for the latest commit

bash

git cat-file -t <sha>

Type of an object (a snapshot is accessed through a tree)

bash

git diff <commit-A> <commit-B>

Diff between two snapshots, computed at the time of the request