#why-mvcc
Why do you need MVCC? What does it buy over locking on read?
What to say
MVCC (multiversion concurrency control) keeps several versions of one row at the same time. A reader sees a snapshot of the data as of the start of the query or transaction and does not wait for writers, and a writer does not wait for readers. The core rule: reads do not block writes, writes do not block reads. The price is that old versions pile up as garbage and have to be cleaned (that is vacuum's job). The alternative from older databases, locking a row on read, produces less garbage but turns concurrent load into a queue.
What they want to hear
A candidate should: - state the rule "readers do not block writers and the reverse" and understand that this is the whole point of multiversioning - name the flip side: every old version lives until cleanup, so MVCC and vacuum are one topic - tie a snapshot to a point in time: what a transaction sees depends on the isolation level and on when the snapshot was taken - not confuse MVCC with "no locks": row and relation locks are still there, MVCC only removes the read versus write conflict
Pitfalls
- ✗ Saying "PostgreSQL has no locks thanks to MVCC". Locks exist. MVCC only removes the read versus write conflict
- ✗ Forgetting the price: versions accumulate, and without vacuum the table bloats
- ✗ Thinking a reader always sees the freshest data. It sees its snapshot, not the latest state
Follow-up
- ? Why does a long-running reading transaction hinder garbage cleanup?
- ? What still blocks in PostgreSQL despite MVCC?
- ? Where does table bloat come from under an UPDATE-heavy load?
Depth in knowledge base