how/mvcc
MVCC: one row, two versions
An UPDATE in PostgreSQL never overwrites a row. It writes a new version and stamps the old one with the transaction that retired it. For a while the same row exists twice on the page, and two sessions can read two different truths. Here is how, and how VACUUM cleans up afterward.
The first surprising thing about PostgreSQL is that UPDATE does not change a row in place. It writes a new version of the row and leaves the old one on the page, marked as superseded. The same goes for DELETE (no new version, just a mark) and for every row an INSERT creates.
This is MVCC, multi-version concurrency control. Each version carries two transaction stamps in its header (see tuple-header):
xmin- the transaction that created this version;xmax- the transaction that retired it (updated, deleted, or locked it).xmax = 0means the version is still live.
A transaction decides what it may see through a snapshot (see snapshot): a frozen view of which transactions had committed when it started. The same bytes on disk look different to different snapshots, and that is exactly why readers never block writers. Press play to follow one row from a single version all the way to a vacuumed page.
§ steps
Start with a single row in a table
mv, inserted by transaction 786:sqlSELECT xmin, xmax, ctid, * FROM mv;
-- xmin | xmax | ctid | id | note
-- ------+------+-------+----+------
-- 786 | 0 | (0,1) | 1 | a
One version, at
ctid (0,1). Itsxminis 786 (the transaction that created it) and itsxmaxis 0, which means nobody has retired it: the row is live. This is what every reader sees.
recap
What to remember:
UPDATEis delete plus insert under the hood: a new version with a freshxmin, and the old version stamped withxmax = <updating txid>. Both live on the page until cleanup. See why-mvcc for the design reason.- Visibility is computed, not stored. A version is visible if its
xminis committed and inside your snapshot, and itsxmaxis empty or not yet visible to you. The full rules are in xmin-xmax. - Two snapshots can legitimately read two different values of the same row at the same wall-clock moment. Nothing is corrupt; they are looking through different time frames.
- A version no live snapshot can see anymore is a dead tuple. It still takes space on the page until VACUUM removes it and frees its line pointer (see vacuum).
- VACUUM can only remove versions older than the oldest snapshot still running, the transaction horizon (see transaction-horizon). One long transaction holds the horizon back and lets dead tuples pile up across the whole database, which is where table bloat comes from.
- When the update fits on the same page and no index covers the changed column, Postgres uses a HOT update (see hot-updates) to chain the versions without touching the indexes.
The payoff: readers never block writers and writers never block readers. The cost is dead tuples, and collecting them is the whole job of VACUUM.
§ dig into the knowledge base
- why-mvccwhy MVCC - readers and writers that never block
- xmin-xmaxxmin and xmax - the visibility rules
- snapshotsnapshot - the frozen view of committed transactions
- hot-updatesHOT updates - version chains that skip the index
- vacuumVACUUM - the dead-tuple collector
- transaction-horizontransaction horizon - how far back VACUUM can clean