how/storage
Page anatomy: 8 KB, filled from both ends
A table on disk is an array of 8 KB pages, and each page fills from two sides at once: a line-pointer array grows down from the header, tuples grow up from the end, and the free space is whatever gap is left in the middle. Here is how one page fills up.
PostgreSQL never reads or writes a table row by row or byte by byte. Its unit of I/O, caching, and locking is the page (also called a block): a fixed 8 KB, 8192 bytes. A table file is just an array of these pages, numbered from zero.
Inside, every page has the same four zones:
- a 24-byte header with the page's LSN, checksum, and three offsets:
pd_lower,pd_upper,pd_special; - an array of line pointers, 4 bytes each, that grows down from the header;
- the free space in the middle;
- the tuples (the row versions themselves), which grow up from the end of the page.
The trick is that the pointer array and the tuples grow toward each other. The page is full the moment they meet. A line pointer is the stable address of a row (see line-pointers), so a tuple can move inside the page without its row id changing.
Press play to watch one page fill from empty to full in five steps.
§ steps
A freshly extended page is almost entirely free space. At the very top sits the 24-byte header:
pd_lsn last WAL record that touched this page
pd_checksum optional page checksum
pd_lower 24 -> end of the line-pointer array
pd_upper 8192 -> start of the tuples
pd_special 8192 -> the special area (empty for a heap)
On an empty page
pd_lowersits right after the header andpd_uppersits at the very end, so the gap between them is the whole page. Everything that follows is just these two numbers moving toward each other.
recap
What to remember:
- A page is 8192 bytes, the unit of I/O and the granularity of the buffer cache (see buffer-cache). A table is an array of pages; an index is a different arrangement of the same 8 KB blocks.
- Two cursors define the layout:
pd_loweris where the line-pointer array ends,pd_upperis where the tuples begin. Free space is exactlypd_upper - pd_lower, and the free space map tracks it per page (see free-space-map) so an INSERT can find a page with room. - A row's address is a
ctidof(block, line-pointer), not a byte offset. The pointer holds the real offset and can move during page compaction, while the row id stays put. The details are in line-pointers. - A tuple carries its own header with the MVCC fields (see tuple-header), so one logical row can have several versions on the same page at once.
- The special area at the very end is empty for a heap table; index access methods use it for their own bookkeeping. When
pd_lowermeetspd_upper, the next row goes to a new page, and a value too big for a page is pushed out to TOAST storage (see toast).
You can read all of this live with the pageinspect extension: page_header() returns the three offsets, heap_page_items() lists the line pointers and tuples.
§ dig into the knowledge base
- page-layoutpage layout - the four zones of an 8 KB page
- line-pointersline pointers - the stable address of a row
- tuple-headertuple header - the per-version system fields
- free-space-mapfree space map - which page has room
- toastTOAST - where oversized values go
- relfilenode-forksrelfilenode and forks - the files behind a table