Question 1

How is a B-tree built, and why is the search logarithmic?

Accepted Answer

A B-tree is a balanced tree of pages. The root and internal pages hold
separators: a key plus a link to a child page. The leaves hold keys in
sorted order and links to heap rows (`ctid`). The search goes from the root
down: at each level the separators pick the right branch, and in a few
steps (the tree height) you reach a leaf. The height grows with the
logarithm of the row count, so even on billions of records it is a handful
of page accesses. The leaves are linked left and right, which gives a fast
range walk and `ORDER BY` without a separate sort.

Question 2

What makes a condition unusable for an index? Explain sargability.

Accepted Answer

An index on a column stores the values of that column. If the condition
wraps the column in a function or expression, `WHERE lower(email) =
'a@b.c'`, `WHERE created_at::date = '2024-01-01'`, `WHERE id + 0 = 5`, an
index on the raw column does not fit: it has no `lower(email)` values. The
same with `LIKE '%abc'`: the leading percent kills the ability to descend
the tree, because the start of the string is unknown. A condition the index
can use directly is called sargable. There are three cures: rewrite the
predicate so the column is "bare"; build an index on the expression
(`CREATE INDEX ON t (lower(email))`); for patterns and substrings take a
trigram index (`pg_trgm`).

Question 3

How do you choose the column order in a composite index?

Accepted Answer

A composite index `(a, b, c)` is a list sorted by `a`, within equal `a` by
`b`, and so on. So it works for conditions on a left prefix: `a`, `a` and
`b`, `a` and `b` and `c`. On `b` alone, or on the pair `b, c` without `a`,
it is useless. The rule: put first the column that is always an equality,
and last the one with a range or a sort. A column from `WHERE a = ? AND b >
?` fits the index `(a, b)` well, while `(b, a)` is already worse here. The
column order is not cosmetics. It directly decides whether the planner uses
the index.

Question 4

What is an index-only scan, and what does the visibility map have to do with it?

Accepted Answer

An ordinary index scan finds a `ctid` in the index and goes to the heap for
the row itself, to check visibility and fetch the remaining columns. If the
index contains every column the query needs (a covering index, including
through `INCLUDE`), the heap trip could be skipped. But the index holds no
information about version visibility. The visibility map (VM) saves you: if
a page is marked in it as "all versions visible to everyone", the heap row
need not be read. So an index-only scan is efficient only on a
well-vacuumed table with an up-to-date VM. Under an UPDATE load without
timely vacuum, an index-only scan degrades into an ordinary one with many
`Heap Fetches`.

Question 5

When do you need a GIN index, and how is it built?

Accepted Answer

GIN (generalized inverted index) is an inverted index: it stores not "row
to value" but "element to the list of rows where it occurs". That is what
you need for composite values: full-text search (word to documents),
`jsonb` (key or path to rows), arrays (element to rows). A query like
`WHERE tags @> '{postgres}'` or `WHERE doc @@ to_tsquery('...')` GIN serves
directly. The price is expensive insert and update: one UPDATE of a
document touches many index elements. A deferred pending list
(`fastupdate`) smooths this out, but it adds periodic cleanup. GIN is large
and slow to write, yet it is indispensable for search by content.

Question 6

When is BRIN better than a B-tree?

Accepted Answer

BRIN (block range index) stores not the row values but a summary over
block ranges: for each stretch of the table it remembers the minimum and
maximum value. The index comes out tiny, kilobytes where a B-tree would
take gigabytes. It works only with good correlation: if the values grow
physically along with row order (the typical example is a time column in a
table written in increasing order), then by a range you can immediately
drop blocks whose min/max do not fit. On poorly correlated data BRIN is
useless: the matching rows are scattered across all blocks, with nothing to
drop. This is the index for large append-only tables with a natural order.

Question 7

GiST, SP-GiST, GIN: which is for which task?

Accepted Answer

GiST (generalized search tree) is a framework for trees over "inexact"
predicates: geometry and `PostGIS` (intersection, proximity), range types,
nearest-neighbor search (`ORDER BY point <-> target`). SP-GiST is its
relative for unbalanced structures: quadtrees, prefix trees, data with a
natural partitioning of space. GIN is the inverted index for composite
values: full-text, jsonb, arrays. A rough rule: searching by geometry,
ranges, and nearest neighbors goes to GiST; by the content of a document,
array, or jsonb to GIN; an exotic spatial structure with uneven
partitioning to SP-GiST. Each has its own set of operator classes for
specific types.

Question 8

What is an operator class, and why does an index need it?

Accepted Answer

An index on its own does not know how to compare values of a specific
type. The operator class provides that knowledge. It ties a data type and
an access method to a set of operators and support functions: for a B-tree
that is "less, less-or-equal, equal, greater" and a comparison function. So
a single type can have several classes for different tasks. The canonical
example is `text`: the default class sorts by locale and serves `=` and
`ORDER BY`, but is no good for a prefix `LIKE` in a non-C locale; for that
there is `text_pattern_ops`, which compares byte by byte and makes
`LIKE 'abc%'` indexable. You name the class when creating the index:
`CREATE INDEX ON t (col text_pattern_ops)`.

Question 9

When is a hash index appropriate, and what are its limits?

Accepted Answer

A hash index stores the hash of a value and serves only equality (`=`): no
ranges, no sorting, no prefix search. In return it is more compact than a
B-tree on long keys, and on pure `=` it can be a touch faster. Before
PostgreSQL 10 hash indexes were not written to WAL and did not survive a
crash, so people avoided them; since 10 they are full-fledged and
replicate. In practice their niche is narrow: a B-tree also does equality
perfectly and additionally handles ranges and sorting, so by default you
take a B-tree, and hash only when the key is long, you need strict
equality, and the index size matters.

B-tree, GiST, GIN, BRIN, sargability