git filter-repo is a standalone tool (not part of Git core; install
it separately). It rewrites history completely: every commit is
recreated with a different tree or metadata. Think of it as surgery.
Use with care.
It replaces git filter-branch, which was officially deprecated in
Git 2.24. filter-branch is slow and easy to misuse. filter-repo
is fast (written in Python, uses git fast-export) and safer: by
default it requires a fresh clone.
Installation
# macOS
brew install git-filter-repo
# Debian/Ubuntu
apt install git-filter-repo
# Pip
pip install git-filter-repo
Main uses
Remove a file from all history
The typical scenario is an accidentally committed secret. See secret-scanning: rotate the key first, then clean the history.
git filter-repo --path secrets.env --invert-paths
--invert-paths means "delete the matches." The command removes
secrets.env from every commit in history.
Replace a string
If you need to scrub a specific value (an API key, say) rather than a whole file:
# Create a file with replacement rules
cat > replace.txt <<EOF
AKIAIOSFODNN7EXAMPLE==>[REMOVED]
literal:my-secret-password==>[REMOVED]
regex:ghp_[a-zA-Z0-9]{36}==>[REMOVED]EOF
git filter-repo --replace-text replace.txt
This walks the content of every commit and substitutes matching strings with the placeholder.
Remove a large file
Someone committed a 500 MB dataset and the repo ballooned. Remove it from history:
git filter-repo --strip-blobs-bigger-than 100M
Or by path:
git filter-repo --path dataset.csv --invert-paths
After that, .git/ is usually still physically large. To reclaim
space:
git reflog expire --expire=now --all
git gc --aggressive --prune=now
Change an email across all history
You committed with a personal email and want it to show your work address:
cat > mailmap.txt <<EOF
Your Name <work@company.com> <personal@gmail.com>
EOF
git filter-repo --mailmap mailmap.txt
All commits from personal@gmail.com will be shown as
work@company.com. Commit SHAs change.
After filter-repo
The command rewrites history: every commit SHA changes. The consequences:
- All clones are now stale. Nobody can just
git pull: the histories have diverged. Everyone needs a fresh clone. - PRs and issues referencing old SHAs break. Those commits no longer exist.
- Force push to the remote. Be careful: branch protection normally blocks this and must be temporarily disabled.
- A backup is required. Before running:
git clone <url> backup-clone. If something goes wrong, you have a restore point.
These consequences make filter-repo an "once a year" operation, not a daily one. It is usually a team effort, done with coordination and advance notice.
Alternatives
- BFG Repo-Cleaner is a Java tool, even faster on large repos. It is less flexible: file and blob deletion only, no text replacement.
- Just rotate and move on. If the secret was in a public repo, it is already compromised. Cleaning the history does not undo the exposure. Sometimes the right call is to rotate the key, learn the lesson, and leave the history alone.
Pitfalls
- filter-repo only works on a fresh clone by default. Run it in
a repo with many remotes and it will refuse. You can pass
--force, but that is a signal to stop and think. - Submodules are preserved by filter-repo, but if pointers change you may need separate work inside each submodule. See detached-head.
- Very large repos (tens of GB) can take hours to filter. Run
with
nohupor inside a screen session. - Tags are rewritten locally but not in others' clones. On your
machine, filter-repo updates
refs/tags: a tag pointing to a rewritten commit starts pointing to the new SHA; a tag on a deleted commit disappears. But colleagues who cloned earlier keep their old tags locally.git fetchdoes not remove them automatically. They needgit fetch --prune --prune-tagsor a fresh clone. On the server, delete stale tags explicitly:git push origin :refs/tags/<tag>.