linuxlab.io
Tutorials▾
  • Linux & networking
    File system, processes, TCP/IP, BGP and OSPF
    →
  • Terraform & IaC
    HCL, state, plan/apply on a LocalStack sandbox
    →
  • Git & GitHub
    Object model, plumbing, branching, GitHub Actions
    →
All tutorials →
PricingAboutSign inCreate account
/
  • Introduction
  • Lessons
  • How it works
  • Simulator
  • Knowledge base
  • Interview prep
Index
Categories
All entries
Footer
linuxlab-TutorialsPricingAboutPrivacy & cookies
Copyright © 2026 LinuxLab. All rights reserved.
home/linux/kb/Processes & resources/mmap

kb/processes ── Processes & resources ── advanced

mmap: files and shared memory

`mmap()` maps a file (or an anonymous region) into a process virtual address space. Reads and writes through the pointer become file operations. This is the basis of shared memory.

view as markdownaka: memory-mapping, shared-memory, memory-mapped-files

The idea

The usual way to work with a file: open, then read(fd, buf, n) into your own buffer, then work with that buffer. This is a copy: page cache into a user buffer.

With mmap: open, then mmap(fd), and you get a pointer that you read and write like an ordinary array. No read(). The kernel loads pages lazily on a page fault.

c
int fd = open("data.bin", O_RDONLY);
void *p = mmap(NULL, len, PROT_READ, MAP_SHARED, fd, 0);
// now p[42] is the first byte read; the page loads on the first access

Kinds of mappings

File-backed:

  • MAP_PRIVATE is a private copy. Writes do not reach the file. Used to load binaries and libraries.
  • MAP_SHARED makes changes visible to everyone who mapped the same file, and they are saved to disk. This is shared memory through a file.

Anonymous (MAP_ANONYMOUS, no fd):

  • MAP_PRIVATE | MAP_ANONYMOUS is an ordinary heap. This is how malloc() works for large allocations.
  • MAP_SHARED | MAP_ANONYMOUS is shared between fork children.

Why use it

  • Databases and search engines. Postgres, Lucene, and SQLite use mmap for their data files. The page cache does the work for them.
  • Loading binaries. Every ELF file and .so is mapped (/proc/<pid>/maps).
  • Large files with random access. Map the file, jump around by offset. The kernel loads only the pages you touch.
  • IPC between processes. MAP_SHARED on one file gives a fast shared region between independent processes, with no copies.
  • Memory-mapped I/O for devices (/dev/mem).

/dev/shm: POSIX shared memory

This is a tmpfs, made for shared mappings:

c
int fd = shm_open("/myseg", O_CREAT | O_RDWR, 0600);
ftruncate(fd, 1024 * 1024);
void *p = mmap(NULL, 1024*1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
// p points to 1 MB; another process calls shm_open("/myseg") and sees the same data

The file actually lives in /dev/shm/myseg, which you can see with ls /dev/shm.

The size of /dev/shm is usually 50% of RAM (tmpfs). Change it through mount:

bash
sudo mount -o remount,size=8G /dev/shm

Does Postgres use /dev/shm for shared_buffers? No. It usually uses System-V shared memory or maps a file directly. /dev/shm is used more by redis, scientific computing, and video processing.

When mmap hurts

  • Network filesystems. NFS and CIFS can give odd results (consistency across the network).
  • Huge files larger than the VAS on 32-bit. On 32-bit systems the VAS is 3-4 GB.
  • Append-heavy workloads. Growing a mapped file over and over is expensive; an ordinary write() is better.
  • Fault storms. Random access to a cold file means thousands of major faults and can stall.

madvise: hints to the kernel

c
madvise(addr, len, MADV_SEQUENTIAL);    // I will read sequentially: large readahead
madvise(addr, len, MADV_RANDOM);         // random access: disable readahead
madvise(addr, len, MADV_DONTNEED);       // these pages are no longer needed: drop them
madvise(addr, len, MADV_HUGEPAGE);       // merge into huge pages where possible
madvise(addr, len, MADV_DONTFORK);       // do not duplicate in the child on fork

Debugging and observation

bash
cat /proc/<pid>/maps              # all mappings of the process
pmap -x <pid>                      # with sizes and RSS
cat /proc/<pid>/smaps | head -30   # per region block:
# Size:                4 kB    <- VSZ
# Rss:                 4 kB    <- actually in RAM
# Pss:                 2 kB    <- proportional (divided across shared)
# Shared_Clean:        4 kB    <- shared, clean
# Private_Dirty:       0 kB
# Swap:                0 kB

PSS (Proportional Set Size) is the better metric for "actually in use": a 100MB shared library across 10 processes gives each one a PSS of 10MB, while each RSS is 100MB.

How it ties into other parts

  • mmap of a file plus shared equals page cache (page-cache). The same page is visible both in an ordinary read() and through mmap.
  • Anonymous mmap is heap, and it swaps out when RAM runs short (swap).
  • All process-and-pid processes share libc.so.6 through MAP_PRIVATE mappings (read-only code shared).

§ команды

bash
cat /proc/$$/maps | head

All mappings of the shell: binary, libraries, heap, stack

bash
pmap -x <pid> | tail -1

Summary: total VSZ plus RSS of the process

bash
ls -l /dev/shm

POSIX shared memory segments: which applications use them

bash
cat /proc/<pid>/smaps | grep -A1 heap

RSS and PSS of the heap segment of a specific process

bash
ipcs -m

Old System-V shared memory (a separate subsystem from mmap)

§ см. также

  • virtual-memoryVirtual memory: virtual addresses, page tablesEach process sees its own 64-bit virtual address space. The MMU translates virtual addresses to physical ones through page tables. This is the basis of isolation and mmap.
  • page-cachePage cache: disk in memoryPage cache is RAM that holds file contents. Every filesystem read and write goes through it. In free it looks like used memory, but the cache is available.
  • file-descriptorsFile descriptors (stdin, stdout, stderr)A file descriptor is an integer a process uses to reach an open file, socket, or pipe. Every process gets 0/1/2 = stdin/stdout/stderr.
Footer
linuxlab-
Copyright © 2026 LinuxLab. All rights reserved.
Tutorials
Pricing
About
Privacy & cookies