#btree-structure
How is a B-tree built, and why is the search logarithmic?
What to say
A B-tree is a balanced tree of pages. The root and internal pages hold separators: a key plus a link to a child page. The leaves hold keys in sorted order and links to heap rows (`ctid`). The search goes from the root down: at each level the separators pick the right branch, and in a few steps (the tree height) you reach a leaf. The height grows with the logarithm of the row count, so even on billions of records it is a handful of page accesses. The leaves are linked left and right, which gives a fast range walk and `ORDER BY` without a separate sort.
What they want to hear
A senior should: - describe the levels: root, internal pages with separators, leaves with keys and ctid - tie the logarithmic complexity to the tree height and the number of page accesses - explain why a B-tree is good for ranges and sorting: the leaves are sorted and linked - mention the optimizations: deduplication of equal keys and suffix truncation of separators save space
Pitfalls
- ✗ Thinking B-tree leaves store the rows themselves. They store the key and a `ctid`, and the row is in the heap
- ✗ Assuming a B-tree is efficient for `LIKE '%x'` or an inequality over an expression. The order does not help there
- ✗ Forgetting the height is small: even on large tables it is a few levels, not dozens
Follow-up
- ? Why does a B-tree give `ORDER BY` without a separate sort?
- ? What is stored in a B-tree leaf?
- ? What is deduplication in a B-tree, and why does it exist?
Depth in knowledge base