linuxlab.io
Tutorials▾
  • Linux & networking
    File system, processes, TCP/IP, BGP and OSPF
    →
  • Terraform & IaC
    HCL, state, plan/apply on a LocalStack sandbox
    →
  • Git & GitHub
    Object model, plumbing, branching, GitHub Actions
    →
All tutorials →
PricingAboutSign inCreate account
/
Intro
Lessons
Footer
linuxlab-TutorialsPricingAboutPrivacy & cookies
Copyright © 2026 LinuxLab. All rights reserved.
linuxlab.io
Tutorials▾
  • Linux & networking
    File system, processes, TCP/IP, BGP and OSPF
    →
  • Terraform & IaC
    HCL, state, plan/apply on a LocalStack sandbox
    →
  • Git & GitHub
    Object model, plumbing, branching, GitHub Actions
    →
All tutorials →
PricingAboutSign inCreate account
/
  • Introduction
  • Chapters
  • How it works
  • Lessons
  • Knowledge base
  • Interview prep
Cluster

← все кластеры

Objects, SHA, packfile, working tree

Questions about the Git object model: the four object types, SHA as the key, how history sits on disk, and why branches in Git are cheap. This is the foundation for every other cluster. Without it, talking about rebase, reflog, and filter-repo is pointless.

7 вопросов · ~35 мин чтения

Questions

На этой странице

  1. 01The four object types in Git. What does each one store and how are they connected?
  2. 02Does Git store a snapshot or a delta? Explain how it really works.
  3. 03What physically happens during `git add file.txt`?
  4. 04Why SHA-1 and not MD5? What about the move to SHA-256?
  5. 05Working tree, index, HEAD. How do they differ and which one stores what?
  6. 06What physically changes on disk during `git commit -m "msg"`?
  7. 07How does Git pull an object's content out of a packfile? What is a base+delta chain?

#git-objects-four-types

juniorчасто

The four object types in Git. What does each one store and how are they connected?

Что отвечать

A `blob` is the content of a single file, with no name and no permissions. A `tree` is the list of entries for one directory: permissions, type, SHA, name. A tree entry points either at a blob (a file) or at another tree (a subdirectory), which gives you a recursive tree. A `commit` is the SHA of the root tree plus metadata: parent(s), author, committer, date, message. A `tag` is a named pointer to a commit, usually with a signature.

Что хотят услышать

A senior should: - call Git a key-value store: the key is the SHA, the value is the zlib-compressed object in `.git/objects/` - explain the tree recursion: a directory tree is a tree of tree objects - say that a blob does not know the file name. The name lives in the tree that points at the blob - explain why a branch is just a file with one SHA in `.git/refs/heads/<name>`, and HEAD is a file with a pointer to a branch - distinguish an annotated tag (an object in `.git/objects/`) from a lightweight tag (just an entry in `.git/refs/tags/`)

Подводные камни

  • ✗ Saying "a commit stores changes". In reality a commit stores a full snapshot through a tree, and Git computes the diff on the fly
  • ✗ Saying "a blob knows the file name". It does not. The name is added by the tree object
  • ✗ Confusing a tag object with a branch. Both are refs, but a tag usually does not move while a branch does

Follow-up

  • ? What does `git cat-file -p HEAD^{tree}` show?
  • ? Where does the branch `feature/x` physically live?
  • ? How does a lightweight tag differ from an annotated one in terms of objects?

Глубина в базе знаний

  • Blob
  • Tree
  • Commit
  • Tag
  • SHA-1 in Git
tags: internals, objects, modelbook: edu/Git_book/03-object-model.md · how.git.work.bytebytego.pdf

#git-snapshot-vs-delta

intermediateиногда

Does Git store a snapshot or a delta? Explain how it really works.

Что отвечать

Git **stores a snapshot** and transmits and packs a delta. Each commit points at a full tree, and a tree through a blob points at the full content of a file. That is the "logical model". When Git packs objects into a packfile (during `git gc`, during push, during clone), it applies delta compression: similar blobs are replaced with a diff against a chosen "base" object. They are reconstructed on `cat-file`.

Что хотят услышать

A senior should: - cleanly separate the logical model (snapshot) from the physical storage format (loose object plus packfile with delta) - say that in loose form every object is complete, while in a packfile it may be a base+delta chain - explain why the snapshot model is used: commit atomicity, easy recovery, easy comparison of two trees - name `git verify-pack -v <pack>` as the way to see which objects went into a delta and against which base

Подводные камни

  • ✗ Saying "Git stores deltas like Subversion". Wrong. Logically it is a snapshot, and the delta is only in the packfile
  • ✗ Saying "a packfile is a transfer format". A packfile is also the storage format inside `.git/objects/pack/`
  • ✗ Thinking the delta always runs between commits that are adjacent in time. The heuristic picks a base by similarity, not by order

Follow-up

  • ? What happens to repository size if you commit the same file slightly differently 100 times?
  • ? What does `git gc --aggressive` do to packs?
  • ? What is `.git/objects/info/packs` for?

Глубина в базе знаний

  • Snapshots vs. deltas
  • Packfile
  • Blob
tags: internals, packfile, deltabook: edu/Git_book/03-object-model.md · how.git.work.bytebytego.pdf

#git-add-physically

intermediateчасто

What physically happens during `git add file.txt`?

Что отвечать

Git reads the file content, computes the SHA-1 of `blob <len>\0<content>`, and writes the blob to `.git/objects/<2>/<38>` (zlib-compressed). Then it updates `.git/index`, a binary file with a list of entries: path, mode, blob SHA, stat metadata. The working tree is not touched. `git commit` then takes the index, builds tree objects from it, writes the commit, and moves the branch.

Что хотят услышать

A senior should: - name the three places where "state" lives: working tree, index (stage), and `.git/objects/` plus the refs - explain that the blob is created already at `git add`. Even without a commit, a copy sits in `.git/objects/` - say that the index is a flat list of paths, not a tree. The tree is assembled at commit time through `git write-tree` - mention that a repeated `git add` after editing the file creates a new blob, while the old one stays in objects (it gets collected later by `git gc`)

Подводные камни

  • ✗ Saying "git add just marks the file for commit". In reality it already creates a blob on disk
  • ✗ Thinking the index stores patches. It holds a full snapshot for each file through a SHA
  • ✗ Saying "git commit reads the working tree". It reads the index, not the working tree

Follow-up

  • ? What does `git ls-files --stage` show?
  • ? How does `git add -p` differ from `git add`?
  • ? What happens if you edit a file after `git add` and commit without adding it again?

Глубина в базе знаний

  • git add
  • Working tree
  • Blob
  • Commit
tags: internals, index, add

#git-sha1-collision-sha256

intermediateиногда

Why SHA-1 and not MD5? What about the move to SHA-256?

Что отвечать

MD5 has been definitively broken since 2004, and SHA-1 was chosen in 2005 as a sensible balance of "long enough plus fast". In 2017 Google demonstrated a SHA-1 collision (SHAttered). For Git this is less scary: the hash is used for addressing, not for signing, and the `blob <len>\0` prefix makes the attack harder. Migration to SHA-256 has been underway since 2018, Git now supports both formats, but new repositories still default to SHA-1.

Что хотят услышать

A senior should: - distinguish "finding a collision in theory" from "crafting a collision for a required prefix". The second is orders of magnitude more expensive - say that migration to SHA-256 is enabled with the flag `git init --object-format=sha256` and is still incompatible with old repositories (there is no transparent transition) - explain that in Git the hash is used both for content and for chain integrity. Tampering with one commit breaks the SHA of every commit after it - mention SHAttered as a concrete reference: two PDFs with the same SHA-1

Подводные камни

  • ✗ Saying "Git is already on SHA-256 by default". No, the transition is not finished
  • ✗ Thinking a SHA-1 collision instantly breaks Git. It does not. Git lives with it through extra checks (`core.commitGraph`, fsck-checks)
  • ✗ Mixing up the lengths: SHA-1 is 40 hex characters, SHA-256 is 64

Follow-up

  • ? What does `git rev-parse HEAD` show in a repository with `--object-format=sha256`?
  • ? How does Git defend against the SHAttered scenario in practice?
  • ? Why was MD5 never considered for Git?

Глубина в базе знаний

  • SHA-1 in Git
  • Blob
tags: internals, sha, securitybook: edu/Git_book/03-object-model.md

#git-working-tree-index-head

intermediateчасто

Working tree, index, HEAD. How do they differ and which one stores what?

Что отвечать

The working tree is the files as you see them in the editor. The index (stage) is the binary file `.git/index` with what will go into the next commit: entries of path+mode+SHA. HEAD is a pointer to the "current position", usually through a branch (`ref: refs/heads/main`), sometimes straight at a commit (detached). `git status` shows the diff between these three states: working vs index and index vs HEAD.

Что хотят услышать

A candidate should: - name the three states and tie them to commands: `add` moves working → index, `commit` moves index → HEAD, `reset` moves HEAD back with different modes - explain `git diff` (working vs index), `git diff --cached` (index vs HEAD), `git diff HEAD` (working vs HEAD) - say that the index is not a tree but a flat list of paths with slashes. The tree is assembled only at commit time - mention that a detached HEAD is not "broken". It is a valid state, just without a branch name

Подводные камни

  • ✗ Saying "the index is a patch". No, the index holds a snapshot through blob SHAs
  • ✗ Thinking `git diff` always shows working vs HEAD. By default it is working vs index
  • ✗ Mixing up `git reset --soft` and `--hard`. soft moves only HEAD, hard moves HEAD+index+working

Follow-up

  • ? What does `git diff --cached` show?
  • ? How does `git reset --mixed` differ from `--soft`?
  • ? How do you inspect the contents of the index without third-party tools?

Глубина в базе знаний

  • Working tree
  • git status
  • git diff
  • Detached HEAD
tags: internals, index, head

#git-commit-what-happens

seniorиногда

What physically changes on disk during `git commit -m "msg"`?

Что отвечать

Git takes the index, recursively assembles tree objects (one per directory), computes their SHAs, and writes them to `.git/objects/`. Then it creates a commit object: the SHA of the root tree, the SHA of the parent (from the current branch), author/committer, message. All of it is zlib-compressed and written to `.git/objects/`. Finally `.git/refs/heads/<branch>` is updated to the SHA of the new commit, and an entry is written to `.git/logs/HEAD` and `.git/logs/refs/heads/<branch>`, which is the future reflog.

Что хотят услышать

A senior should: - name the order: write-tree → commit-object → update-ref → reflog-entry - explain that a commit with no changes in the index is possible through `--allow-empty`, but without the flag Git refuses - say that a merge commit differs by two (or more) parent SHAs in the commit object - mention the hook points: `pre-commit` runs before write-tree, `prepare-commit-msg` before the editor, `commit-msg` after the message is written, `post-commit` after update-ref

Подводные камни

  • ✗ Thinking `git commit` writes a diff. It writes full tree+commit objects
  • ✗ Saying "no new trees are created if no folder changed". The root tree is always created, because its SHA depends on the nested trees (but unchanged subtrees are reused)
  • ✗ Forgetting about the reflog update. It is exactly why `reset --hard` can later be recovered through `git reflog`

Follow-up

  • ? What happens if the `pre-commit` hook returns a non-zero exit?
  • ? What is `git commit --allow-empty` for?
  • ? What in a merge commit's commit object differs from a regular one?

Глубина в базе знаний

  • Commit
  • Tree
  • git reflog
tags: internals, commit, hooks

#git-packfile-delta-chain

seniorредко

How does Git pull an object's content out of a packfile? What is a base+delta chain?

Что отвечать

A packfile (`.git/objects/pack/*.pack`) stores objects one after another, sometimes as full snapshots (type `OBJ_BLOB`/`OBJ_TREE`/...), sometimes as a delta against another object (types `OBJ_OFS_DELTA` or `OBJ_REF_DELTA`). The index file (`.idx`) gives O(log n) lookup of SHA → offset. On a request Git walks the chain: if an object is a delta, it finds the base, reconstructs the base recursively, and applies the delta. The chain length is bounded by `pack.depth` (default 50).

Что хотят услышать

A senior should: - name `.pack` plus `.idx` as a pair of files, and say that `.idx` is needed for lookup without scanning the whole pack - distinguish `OFS_DELTA` (an offset within the same pack) from `REF_DELTA` (the SHA of the base object, which may be in another pack or loose) - explain why long delta chains slow down reads. You have to reconstruct all the bases from the root of the chain - name `git verify-pack -v` for debugging and `pack.window`/`pack.depth` as the settings for the packing heuristic - mention that `git gc` rebalances delta chains

Подводные камни

  • ✗ Thinking the delta always runs between "time neighbors". The heuristic picks a base by name and size similarity, not by date
  • ✗ Saying "a delta is git diff". Not a diff. It is a binary copy/insert format at the byte level, not the line level
  • ✗ Not knowing about `OFS_DELTA` vs `REF_DELTA`. In an interview for a Platform role they will ask for the difference

Follow-up

  • ? What does `git verify-pack -v .git/objects/pack/*.idx | head` print?
  • ? Why is the default `pack.depth=50` bad for very large binary blobs?
  • ? When does Git create a reachability bitmap, and what is it for?

Глубина в базе знаний

  • Packfile
  • Snapshots vs. deltas
  • git cat-file
tags: internals, packfile, performancebook: how.git.work.bytebytego.pdf
Footer
linuxlab-
Copyright © 2026 LinuxLab. All rights reserved.
Tutorials
Pricing
About
Privacy & cookies