Question 1

How does streaming replication work? What is sent between nodes?

Accepted Answer

Streaming (physical) replication sends the WAL stream. On the primary a
walsender process ships WAL records as they appear, and on the replica
a walreceiver process accepts them and applies them, replaying the same
page changes. A replica is a byte-for-byte copy of the cluster: the same
files, the same LSNs. A standby can be a hot standby, accepting read-only
queries while it keeps applying WAL. Application follows the same redo rules
as crash recovery, so a replica always "catches up" the WAL to the
position it managed to receive. The gap between the primary's position and
the replica's is the lag.

Question 2

Synchronous and asynchronous replication: what do you pay for each?

Accepted Answer

With asynchronous replication, a commit on the primary is confirmed right
away, without waiting for the replica. Fast, but on a sudden loss of the
primary the last transactions that did not make it to the replica are lost.
With synchronous replication (`synchronous_commit = on` plus
`synchronous_standby_names`), a commit waits until at least one replica
confirms the WAL write. Zero data loss, but the price is a delay on every
commit by a network round-trip, and if the synchronous replica falls away,
commits on the primary stall. The intermediate levels
`remote_write`/`remote_apply` finely tune what exactly to wait for: a write
to the replica's WAL or its application. The choice is an explicit
trade-off between data loss and latency.

Question 3

What is replication lag, how do you measure it, and where does it come from?

Accepted Answer

Lag is how far a replica trails the primary. It is measured two ways: by
volume (the LSN difference between what the primary wrote and what the
replica received and applied) and by time (replay lag, how many seconds
stale the replica's data is). The causes: a narrow network cannot push
enough WAL; the replica cannot keep up applying the WAL, because redo is
single-threaded and hits the disk or conflicts with reading queries; a spike
of writes on the primary. You watch it through `pg_stat_replication` on the
primary (`sent`/`write`/`flush`/`replay` LSN) and `pg_last_wal_replay_lsn`
on the replica. Large lag means reads from the replica return stale data,
and a failover loses the tail.

Question 4

Why do you need hot_standby_feedback, and what conflict does it solve?

Accepted Answer

A hot standby runs long reading queries, and they need old row versions.
Meanwhile the primary is vacuuming and may remove versions the replica is
still showing to its query. When redo application reaches that removal, a
conflict arises: the replica either cancels the query (`ERROR: canceling
statement due to conflict with recovery`) or stalls application.
`hot_standby_feedback = on` solves this as follows: the replica reports its
horizon to the primary, and the primary does not clean versions needed by
the replica's queries. The price is paid on the primary: its horizon is now
held by the replica too, so a long query on the standby stalls cleanup and
piles up garbage on the primary.

Question 5

How does logical replication differ from physical, and when do you need it?

Accepted Answer

Physical replication copies the whole cluster at the page level: all or
nothing, the same version, read-only on the replica. Logical replication
works at the row level through publication/subscription: WAL is decoded into
logical changes (INSERT/UPDATE/DELETE of specific tables) and applied on the
subscriber with ordinary commands. This gives what physical cannot:
replicate selected tables, between different major versions (handy for an
upgrade with minimal downtime), into a database where the subscriber can
have its own tables and accept writes. It requires `wal_level = logical`,
and tables need a way to identify a row (REPLICA IDENTITY, usually the
primary key). DDL is not replicated logically.

Question 6

What is a replication slot, and why is an abandoned slot dangerous?

Accepted Answer

A replication slot is a record on the primary that remembers the WAL
position a specific consumer (a replica or a logical subscription) has read
up to. While the slot exists, the primary must keep WAL up to that position
and (for logical slots) must not clean row versions needed for decoding.
This saves a replica that fell away for a while: on return it catches up,
because the needed WAL is preserved. But the flip side is dangerous: if the
consumer is gone for good and the slot was not removed, the primary piles up
WAL until the disk runs out and the server stops. An abandoned slot is the
classic cause of a suddenly full `pg_wal`.

Question 7

What are failover and split-brain? Why is automatic switchover dangerous?

Accepted Answer

Failover is promoting a replica to primary when the former primary has
failed. Technically it is a `promote`: a standby stops being read-only and
starts accepting writes. The danger is split-brain: if the old primary is
actually alive (only the network was down) and you promoted a replica, the
cluster ends up with two primaries, both accepting writes, and the data
diverges irreversibly. So reliable automatic failover requires a quorum
arbiter and a fencing mechanism (reliably shut off the old primary, STONITH),
not just a ping. Tools like Patroni build this on top of distributed
consensus. A naive auto-failover by a ping timeout is a direct path to
split-brain.

Question 8

Is a replica a backup? What are the pitfalls of working with distributed data?

Accepted Answer

A replica is not a backup. It copies faithfully, mistakes and all: a
`DROP TABLE` or a value corrupted by the application instantly travels to
every replica. You need a separate backup (a base backup plus a WAL archive
for PITR) so you can roll back to a point before the mistake. Other
distributed traps: a read from an asynchronous replica returns stale data
(read-your-writes breaks if you write to the primary and immediately read
from a replica); synchronous replication adds commit latency; a failover
loses the tail in async mode; distributed transactions across nodes require
a two-phase commit and are not free. The main rule: replication is about
availability, backup is about preservation, and these are different jobs.

Streaming and logical replication, failover