Most version control systems (like SVN or CVS) store data as a list of file-level changes. Think of a spreadsheet where each row records “file X changed by +5 lines, -2 lines.” To reconstruct version N, the system starts at version 0 and applies all N diffs in sequence.
Git does the opposite. Every time you commit, Git takes a snapshot of your entire project — a complete copy of every file at that moment. If a file did not change between commits, Git does not store a second copy; it links back to the existing one. The resulting structure is a DAG (directed acyclic graph) of snapshots, where each snapshot points to its parent(s).
This design choice makes Git fast, safe, and extremely hard to corrupt. You never “apply diffs” to reconstruct history — you just grab the snapshot you want and walk backward through the graph.
A public library has a card catalog. Each card lists a book’s title, author, and shelf location. To find a book, you look up the card, not the shelf. Git works the same way.
Every piece of data Git stores gets a hash (a 40-character SHA-1 checksum) derived from its contents. That hash is both the identifier and the lookup key. You ask Git “give me the object with hash a1b2c3...” and Git retrieves it from its internal storage. If the data changes even by one byte, the hash changes entirely. The content is the address.
This property — content-addressable storage — is the foundation of everything else. It means Git can verify data integrity trivially (recompute the hash and compare), deduplicate automatically (same content = same hash = stored once), and never lose data silently.
Every Git repository has a .git/ directory at its root. This is the repository itself — your working tree is just a checkout. Everything Git knows lives inside .git/.
.git/
objects/ # All your data (blobs, trees, commits, tags)
refs/ # Pointers to commits (branches, tags)
HEAD # The current branch or commit
index # The staging area (binary file)
config # Repository-specific settings
description # Used by GitWeb
hooks/ # Scripts triggered on events (commit, push, etc.)
info/ # Additional metadata (like exclude rules)
logs/ # The reflog — every HEAD movement
The four most important entries are objects/, refs/, HEAD, and index. Everything else is auxiliary.
objects/ stores every version of every file, every directory tree, every commit, and every tag. It is a flat key-value store where the key is the SHA-1 hash and the value is the compressed, typed object.refs/ holds named pointers. refs/heads/main points to the latest commit on the main branch. refs/tags/v1.0 points to a specific commit (or an annotated tag object).HEAD is a plain text file that says either ref: refs/heads/main (you are on a branch) or a raw commit hash (you are in detached HEAD state).index is a binary file (.git/index) that tracks what will go into the next commit — the staging area.Inspect any repository right now: ls -la .git. You will see the same structure regardless of whether the project is one file or a million lines of code.
A blob (binary large object) stores the contents of a single file. Not the filename, not the permissions, not the directory — just the bytes. Two files with identical content produce the same blob hash, regardless of where they live in the project.
Create a blob manually:
echo "hello world" | git hash-object --stdin
# 3b18e512dba79e4c8300dd08aeb37f8e728b8dad
The hash 3b18e51 is the SHA-1 of "blob 12\0hello world\n" — Git prepends the object type and size, separated by a null byte. This header ensures that the same content in different contexts (say, a blob vs. a tree) never collides.
Store the object permanently:
echo "hello world" | git hash-object -w --stdin
# Now lives in .git/objects/3b/18e512dba79e4c8300dd08aeb37f8e728b8dad
Inside .git/objects/, objects are stored as files named by the hash: the first two characters become a directory, the remaining 38 become the filename. This keeps the directory manageable (at most 256 subdirectories at each level).
Read a blob back:
git cat-file -p 3b18e512dba79e4c8300dd08aeb37f8e728b8dad
# hello world
Blobs are the atomic unit of storage. Every file in your project, every version of it, is a blob. Two identical files across different commits share the same blob — Git never duplicates.
A tree represents a directory. It is a sorted list of entries, each containing:
100644 for a regular file, 100755 for executable, 040000 for a subdirectory)A tree maps names to hashes. It is Git’s way of saying “the file src/main.js at this commit has content hash abc123.”
View a tree:
git cat-file -p HEAD^{tree}
# 100644 blob 3b18e51 hello.txt
# 040000 tree a1b2c3d src
Recurse into subtrees:
git ls-tree -r HEAD
# Lists every file with its full path, type, and blob hash
Trees are also content-addressed. If two directories have the exact same contents (same filenames, same file contents, same permissions), they produce the same tree hash. This is how Git deduplicates whole directory structures — an unchanged subtree is stored exactly once across any number of commits.
Trees can nest arbitrarily. A tree points to blobs (files) and other trees (subdirectories). The root tree of a commit represents the entire project root.
A commit is a snapshot of the project at a point in time. It contains:
A commit with no parents is an initial commit (the first in the repository). A commit with one parent is a normal commit. A commit with two or more parents is a merge commit.
git cat-file -p HEAD
# tree 1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b
# parent 9a8b7c6d5e4f3a2b1c0d9e8f7a6b5c4d3e2f1a0b
# author Alice <alice@example.com> 1747267200 +0000
# committer Alice <alice@example.com> 1747267200 +0000
#
# Add user authentication module
The commit object itself is a small text file. It does not store diffs — it stores the root tree hash. To compute what changed between two commits, Git compares their trees recursively.
Commits form a directed acyclic graph because each commit points backward to its parent(s), and following parent pointers can never create a cycle (a child commit cannot exist before its parent).
A <-- B <-- C <-- D (main)
Merge commits introduce branching:
E <-- F <-- G (feature)
/ \
A <--B <-- C <-- D <-- H (main)
Here, commit H (on main) has two parents: D and G. It merges the feature branch into main.
Walk the graph with git log --graph --oneline to see the DAG in any repository.
The three object types — blob, tree, commit — form a hierarchy:
Commit
|
+-- Tree (root)
|
+-- Blob (src/main.js)
+-- Blob (README.md)
+-- Tree (src/)
| |
| +-- Blob (src/utils.js)
| +-- Blob (src/index.js)
+-- Tree (tests/)
|
+-- Blob (tests/test_main.js)
There is a fourth object type — tag (annotated tags store a GPG signature, a message, and a pointer to a commit) — but the core trio of blob, tree, and commit handles all versioned data.
Every object is identified by its SHA-1 hash. The hash is deterministic: given the same type header, content, and length, you always get the same 40-character hexadecimal string. This is what makes content-addressability possible — the identifier is a cryptographic checksum of the thing itself.
# Manual object computation (blob "hello world\n")
printf 'blob 12\0hello world\n' | sha1sum
# 3b18e512dba79e4c8300dd08aeb37f8e728b8dad
SHA-1( "blob 11\\0hello world" ). Change the type or content to see the hash update.Git tracks three distinct tree-like structures at all times:
The term “three trees” is slightly misleading because the index and working tree are not literally Git tree objects, but the conceptual model is powerful.
HEAD Index Working Tree
(committed) (staged) (on disk)
| | |
v v v
tree_a tree_b tree_c
Commands move data between the three:
git add copies working tree into the indexgit commit freezes the index into a new commit (updating HEAD)git restore copies HEAD or index back into the working treegit reset moves HEAD and optionally updates the index and working treeThink of the three trees as layers of a pipeline. You edit files in the working tree (the clay), selectively stage changes into the index (the mold), and commit the index to create a permanent snapshot (the fired ceramic).
A branch in Git is simply a movable pointer to a commit. That is it. No fanfare. When you git commit, the current branch pointer automatically moves forward to the new commit.
Branches are stored as files in .git/refs/heads/:
cat .git/refs/heads/main
# a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0
That is a 40-character hash. A branch is literally a text file containing a commit hash.
Creating a branch creates a new file pointing to the same commit as its parent:
git branch feature
# Creates .git/refs/heads/feature with the same hash as HEAD
Moving between branches updates your working tree to match the commit the branch points to:
git switch feature
# HEAD now says "ref: refs/heads/feature"
# Working tree matches the commit at feature
The term HEAD always refers to the tip of the current branch. When you are on main, HEAD points to refs/heads/main which points to a commit. When you git checkout a specific hash (detached HEAD), HEAD points directly to a commit.
Because branches are just pointers, they are cheap. Creating a branch is instantaneous — it writes 41 bytes to a file. There is no copy of files, no overhead. This is why Git encourages branching early and often.
The staging area (also called the index) is Git’s most misunderstood feature. Why have three states (modified, staged, committed) instead of just two (modified, committed)?
The staging area lets you build a commit incrementally. You can edit three files, stage two of them, and commit only those. The third file stays modified but unstaged — it will not be included in the commit.
git add DoesWhen you run git add file.txt, Git:
.git/objects/.git/index with the new blob hash and file metadataThe index is a binary file that maintains a sorted list of paths, each with:
git status compares three snapshots: HEAD vs index (what is staged), and index vs working tree (what is not staged). The output is derived from these two comparisons.
# See exactly what is in the index
git ls-files --stage
# 100644 3b18e51 0 hello.txt
git reset Does to the Indexgit reset without a path moves the current branch pointer. git reset <commit> <path> updates the index entry for that path to match the given commit — without touching the working tree.
# Unstage a file (update index to match HEAD)
git reset HEAD README.md
This is exactly equivalent to git restore --staged README.md.
The staging area is what makes partial commits possible. It is the assembly line where you arrange changes before committing them into permanent storage.
Merging combines divergent lines of development. Git supports two merge strategies: fast-forward and three-way merge.
If the branch being merged is a direct descendant of the current branch (no divergence), Git simply moves the pointer forward:
Before: A -- B -- C (main)
\
D -- E (feature)
After: A -- B -- C -- D -- E (main, feature)
No merge commit is created. The history remains linear.
If the branches have diverged, Git performs a three-way merge using three snapshots:
Git computes two diffs: base-to-ours and base-to-theirs. For each file:
A merge conflict occurs when both branches modified the same region of the same file in incompatible ways. Git inserts conflict markers into the file and leaves resolution to you:
<<<<<<< HEAD
console.log("hello")
=======
console.log("goodbye")
>>>>>>> feature
Resolve the conflict by editing the file, removing the markers, then git add and git commit to finalize the merge.
A merge commit is a commit with two or more parents. It stores the merged result as its tree and records both parent lineages. This preserves the full history — you can always see exactly which commits were merged.
git log --oneline --graph --parents
Rebasing rewrites history by replaying commits on top of a new base. Instead of creating a merge commit, it transplants commits one by one:
Before:
A -- B -- C (main)
\
D -- E (feature)
git rebase main feature:
After:
A -- B -- C -- D' -- E' (feature)
Each commit D and E is reapplied as a new commit (D’ and E’) with a different parent (C instead of B). Because the parent pointer changes, each new commit gets a new hash — even if the file contents are identical.
git rebase -i opens an editor showing a list of commits and actions:
git rebase -i HEAD~3
pick a1b2c3 First commit
pick d4e5f6 Second commit
pick g7h8i9 Third commit
You can change pick to:
reword — change the commit messageedit — stop to amend the commitsquash — combine with the previous commitfixup — combine but discard the messagedrop — remove the commit entirely| Aspect | Merge | Rebase |
|---|---|---|
| History | Preserves exact branching structure | Creates linear history |
| Commit hashes | Original hashes preserved | Hashes change |
| Conflict resolution | One-time at merge point | Per commit (can be tedious) |
| Safety | Safe for shared branches | Dangerous on shared branches |
| Readability | Shows when branches diverged | Clean, linear timeline |
Golden rule: never rebase commits that have been pushed to a shared branch. Rebase rewrites history — anyone who pulled the old commits will have a divergent history to reconcile.
Cherry-picking applies a specific commit (or range of commits) from one branch onto the current branch. It creates a new commit with the same changes but a different parent, timestamp, and hash.
git checkout main
git cherry-pick a1b2c3d
This takes the changes introduced in commit a1b2c3d on feature and applies them as a new commit on main.
Cherry-pick is useful when:
Internally, cherry-pick works by computing the diff between the target commit and its parent, then applying that diff to the current branch using a three-way merge against the current HEAD.
If conflicts occur, resolve them the same way as a merge, then git cherry-pick --continue.
git cherry-pick --abort # Cancel the cherry-pick
git cherry-pick --skip # Skip this commit
git cherry-pick --continue # Resume after resolving conflicts
git reset moves the current branch pointer and optionally updates the index and working tree. The --soft, --mixed, and --hard flags control how far the reset propagates.
HEAD~1
|
v
Commit A <-- Commit B (HEAD)
git reset --soft HEAD~1:
HEAD moves to A
Index = B's state
Working tree = B's state
Changes from B are "staged and ready to commit"
git reset --mixed HEAD~1 (default):
HEAD moves to A
Index = A's state
Working tree = B's state
Changes from B are "unstaged but present"
git reset --hard HEAD~1:
HEAD moves to A
Index = A's state
Working tree = A's state
Changes from B are gone (but recoverable from reflog)
| Flag | HEAD | Index | Working Tree | Use Case |
|---|---|---|---|---|
--soft | Move | Keep | Keep | Undo commit, keep staged |
--mixed | Move | Reset | Keep | Unstage files (default) |
--hard | Move | Reset | Reset | Discard all uncommitted changes |
Warning: git reset --hard discards uncommitted changes in the working tree. They are not gone permanently (reflog preserves them), but recovering them requires extra steps.
git reset with a file path only affects the index for that file, not HEAD:
git reset HEAD~1 README.md
# Unstages README.md and restores it to the version from the previous commit
The reference log (reflog) records every movement of HEAD — every commit, checkout, merge, rebase, reset, and cherry-pick. It is Git’s safety net.
View the reflog:
git reflog
# a1b2c3d HEAD@{0}: commit: Add login feature
# e5f6g7h HEAD@{1}: commit: Fix navbar styling
# i9j0k1l HEAD@{2}: reset: moving to HEAD~1
# m2n3o4p HEAD@{3}: checkout: moving from main to feature
The reflog is stored in .git/logs/HEAD. Each entry contains:
Imagine you ran git reset --hard HEAD~2 by accident and lost the last two commits. The commits still exist — they are just unreachable from any branch. The reflog still records them:
git reflog
# a1b2c3d HEAD@{0}: reset: moving to HEAD~2
# d4e5f6g HEAD@{1}: commit: Add important feature
# h7i8j9k HEAD@{2}: commit: Another important change
Restore them by creating a branch at the reflog entry:
git branch recovered HEAD@{1}
# Now the commits are reachable via the 'recovered' branch
Or merge the reflog entry:
git merge HEAD@{1}
The reflog is local — it is never pushed or fetched. Each developer has their own reflog tracking their own operations. Entries expire after 90 days by default.
The reflog only tracks references (branches, HEAD). Blobs and trees that become unreachable are cleaned up by git gc after a grace period.
Understanding Git’s internals transforms how you use it. Here are practical applications.
If you deleted a branch with git branch -D, the commits are unreachable from any ref, but the reflog still has them:
git branch -D feature
# Oh no.
git reflog | grep feature
# Find the last commit on that branch
git branch feature <hash>
If the reflog expired, try git fsck --lost-found — it scans all objects and extracts dangling commits.
git bisect uses binary search to find the commit that introduced a bug. It walks the commit graph, checking out the midpoint between good and bad commits:
git bisect start
git bisect bad HEAD # Current commit is broken
git bisect good v1.0 # v1.0 was working
# Git checks out the midpoint ~ 500 commits ago
# Test, then mark:
git bisect good # This one is fine
# Git narrows to ~250 commits
git bisect bad # This one is broken
# Repeat until the exact commit is found
git bisect reset
Internally, bisect counts commits along the DAG to pick the midpoint that minimizes the number of steps.
git worktree checks out multiple branches at once in separate directories, all sharing the same .git/objects/:
git worktree add ../hotfix hotfix-branch
# Creates a new directory with the hotfix branch checked out
This is efficient because objects are shared — no redundant storage. Use worktrees when you need to work on a different branch without stashing or committing your current changes.
| Mistake | Recovery |
|---|---|
| Committed to wrong branch | git reset HEAD~1 && git stash && git switch correct && git stash pop |
| Accidentally staged a file | git reset README.md or git restore --staged README.md |
| Amended wrong commit | git reflog then git reset --hard HEAD@{1} or git reset --hard ORIG_HEAD |
| Need a file from an old commit | git restore --source a1b2c3d -- path/to/file |
| Deleted a branch by accident | git reflog or git fsck --lost-found |
| Pushed a commit with secrets | git filter-branch or git filter-repo (rewrites history) |
The common thread is the reflog and object database. As long as the object still exists in .git/objects/, you can recover it by finding the right hash and attaching a branch or tag to it.
Test your understanding of Git internals with these questions.
Given this history:
A -- B -- C (main)
\
D -- E (feature)
We are on main and we merge feature:
feature point to the merge commit or stay at E?feature branch, are commits D and E gone forever?feature stays at E — merging does not move the merged branch.For each scenario, name the reset flag used (--soft, --mixed, or --hard):
--soft — only HEAD moves.--mixed — HEAD and index move, working tree preserved.--hard — all three move, uncommitted changes discarded.git reset --soft HEAD~1 — keeps the index as-is so you can re-run git commit with the correct message.Starting from the merge commit in the diagram above, trace the output of git log --oneline --graph:
Merge commit H (parents: C, E)
C (parent: B)
E (parent: D)
D (parent: B)
B (parent: A)
A (no parent)
The graph walker in git log uses a priority queue — it always shows the most recent commit first by timestamp, then follows parents. The exact ordering depends on --date-order, --topo-order, or --author-date-order.
tree field in the commit object)..git/objects/ for 90 days (reflog) before git gc reclaims them.git commit --allow-empty with no changes, or two commits with identical file trees but different metadata/timestamps). The tree hash only depends on directory contents, not on commit metadata.