Why it exists
Without virtual memory, every process would live in one flat memory space and corrupt its neighbors. Virtual addresses give you:
- Isolation: a process cannot reach into another process's memory
- The illusion of large memory: each process thinks it has 2^48 bytes; only what you actually use is mapped
- Lazy allocation:
malloc(1GB)does not allocate physical memory until you write to it - Shared mappings: one physical page can appear in the address spaces of several processes (shared libraries, mmap)
- Swap and page cache: a page can physically live on disk or in the cache
Page and page table
Memory is managed in pages (usually 4 KB, sometimes 2 MB / 1 GB, the "huge pages"). On every access the MMU does this:
- Takes the virtual address from the CPU
- Walks the page table of the process (a structure in physical memory)
- Finds the entry for that page, giving physical address + flags (rw/x, present)
- If the page is missing, a page fault fires and the kernel steps in
The TLB (Translation Lookaside Buffer) is a cache of recent translations inside the MMU itself. A TLB miss is expensive, so huge pages help memory-intensive workloads.
Kinds of page fault
- Minor fault: the page is already in RAM (for example, in the page-cache), it just was not attached to this process yet. The kernel fixes up the page table, which is fast.
- Major fault: the page has to be loaded from disk (swap or an mmap-ed file). This is real disk I/O, hundreds or thousands of times slower.
- Segfault (SIGSEGV): access to an address that is not in the process map at all. The program crashes with a core dump.
ps -o min_flt,maj_flt -p <pid> # minor/major fault counters for the process
vmstat 1 # columns si/so are swap-in/swap-out (major faults into swap)
A high maj_flt without swap means mmap-ed files with random access
(databases, large datasets).
Process map: /proc/<pid>/maps
cat /proc/self/maps | head
# 55ee87000000-55ee87004000 r--p 00000000 fc:01 1234567 /usr/bin/cat
# 55ee87004000-55ee87008000 r-xp 00004000 fc:01 1234567 /usr/bin/cat
# 55ee87008000-55ee8700a000 r--p 00008000 fc:01 1234567 /usr/bin/cat
# 7f1abcde0000-7f1abce00000 r--p 00000000 fc:01 7654321 /lib/x86_64.../libc.so.6
# ...
Columns: virtual range / perms (rwxp/s) / offset / device / inode / path.
[heap], [stack], [vdso] are anonymous regions. This is EVERYTHING the
process sees.
pmap -x <pid> # a convenient view of the same data with sizes
cat /proc/<pid>/status | grep -E '^Vm'
# VmSize - total VAS (virtual, not real memory)
# VmRSS - Resident Set Size (actually in RAM right now)
# VmData - heap + anonymous mmap-s
# VmSwap - how much went to swap
RSS vs VSZ: the main misunderstanding
- VSZ (virtual size) is how much the process reserved (lazy memory included)
- RSS is how much is actually held in physical RAM right now
A Java program can show VSZ=10 GB, RSS=300 MB, and that is normal. The JVM allocates heap up front but physically uses little of it.
In containers the OOM killer (oom-killer) looks at RSS, not VSZ.
Memory commit and overcommit
When malloc() returns a pointer, there is no physical memory behind it yet.
Linux can overcommit: hand out more virtual memory than physical+swap.
cat /proc/sys/vm/overcommit_memory
# 0 - heuristic (default): usually grants; sometimes refuses
# 1 - always: grant to everyone; the OOM-killer sorts it out
# 2 - never: hard limit = swap + RAM*overcommit_ratio
cat /proc/sys/vm/overcommit_ratio # usually 50, used when mode=2
Mode 2 (never) is for production databases and critical systems: better to get ENOMEM and handle it than to take an unexpected OOM-kill.
Huge pages
4 KB pages are too small when a process works with gigabytes, since the TLB overflows. Huge pages (2 MB and 1 GB) reduce TLB pressure:
cat /proc/meminfo | grep Huge # how many huge pages the system has
echo 1024 | sudo tee /proc/sys/vm/nr_hugepages # reserve 1024 pages of 2MB each
Used in databases (Postgres huge_pages=on), the JVM (-XX:+UseLargePages),
DPDK / kernel-bypass networking.
Transparent huge pages (THP): the kernel automatically merges 4KB pages into 2MB ones. Good for some workloads, harmful for others (jitter in databases):
cat /sys/kernel/mm/transparent_hugepage/enabled
# always / madvise / never
Debugging
pmap -x <pid>is a map with sizessmemshows RSS / PSS / USS per process (PSS = proportional, splits shared)cat /proc/<pid>/smapsgives detailed info for each regionvmstat 1shows pages in/out, swap, freeslabtopshows the kernel slab allocator (kernel structures)