Why OCI
In 2015 Docker agreed to move the container specs out of its own project so the ecosystem would not depend on a single vendor. The result was the Open Container Initiative (under the Linux Foundation), which maintains three separate specifications:
| Spec | What it describes |
|---|---|
| OCI Image | the on-disk image format: layers + manifest + config |
| OCI Runtime | how the runtime starts a container from rootfs + config.json |
| OCI Distribution | the registry HTTP API for push/pull |
Today "Docker container" is almost a synonym for "OCI container": Dockerfile to image to registry to runtime, each step follows OCI. Alternative runtimes ([[runc-and-runsc|runc/runsc]]), registries (Harbor, GHCR, Quay), and build tools (buildah, kaniko) all work with the same format.
OCI Image: what it is on disk
An image is a set of files on disk, not a tarball. The structure:
myimage/
├── oci-layout ← {"imageLayoutVersion": "1.0.0"}├── index.json ← root, points to the manifests
└── blobs/
└── sha256/
├── <hash-config> ← config (JSON)
├── <hash-layer1> ← layer (tar or tar.gz)
├── <hash-layer2>
└── <hash-manifest> ← manifest (links config + layers)
index.json
{"schemaVersion": 2,
"manifests": [
{"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:abc...",
"size": 1234,
"platform": { "architecture": "amd64", "os": "linux" }},
{"digest": "sha256:def...",
"platform": { "architecture": "arm64", "os": "linux" }}
]
}
This is a multi-arch index. One manifest per platform.
Manifest
{"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"config": {"mediaType": "application/vnd.oci.image.config.v1+json",
"digest": "sha256:abc...",
"size": 7000
},
"layers": [
{ "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip","digest": "sha256:111...", "size": 5000000 },
{ "digest": "sha256:222...", "size": 1000000 }, { "digest": "sha256:333...", "size": 50000 }]
}
A manifest contains:
config: environment, entrypoint, ENV, USER, WORKDIRlayers: an ordered list of tarballs
Config
{"architecture": "amd64",
"os": "linux",
"config": {"User": "1000:1000",
"Env": ["PATH=/usr/bin:/bin"],
"Entrypoint": ["/app/server"],
"Cmd": ["--port=8080"],
"WorkingDir": "/app",
"ExposedPorts": { "8080/tcp": {} }},
"rootfs": {"type": "layers",
"diff_ids": [
"sha256:layer1-uncompressed-hash",
"sha256:layer2-uncompressed-hash"
]
},
"history": [ ... ]
}
Layers: the basis of image deduplication
Each layer is a diff against the previous one: added or changed
files as a tar archive. Deleted files are .wh.<filename> whiteout markers.
On deploy the registry downloads only the missing layers. If 100
images are based on the same ubuntu:22.04, the Ubuntu layer is stored
once. The savings in client and registry storage are enormous.
Layers are applied through [[tmpfs-overlayfs|overlayfs]]: a lower stack of read-only layers plus an upper layer for container writes.
Build vs pull
# Build from a Dockerfile
docker build -t myimage:v1 .
# Push to a registry
docker push registry.example.com/myimage:v1
# Pull
docker pull registry.example.com/myimage:v1
# Without Docker, buildah / podman
buildah bud -t myimage:v1 .
buildah push myimage:v1 docker://registry.example.com/myimage:v1
buildah and podman need no daemon, run like ordinary CLI tools, and write OCI-compatible images.
OCI Runtime: config.json + rootfs
The runtime takes a bundle: a directory with
bundle/
├── config.json ← everything about the container (mounts, namespaces, args)
└── rootfs/ ← extracted layers, the ready FS tree
├── bin/
├── etc/
└── usr/
and starts the container. This is not an image, it is the unpacked image plus the runtime config. The image must be "unpacked" into a bundle before the runtime can work with it (the runtime supervisor does this: containerd, CRI-O).
config.json: what is inside
{"ociVersion": "1.2.0",
"process": {"args": ["/app/server", "--port=8080"],
"cwd": "/app",
"env": ["PATH=/usr/bin:/bin"],
"user": { "uid": 1000, "gid": 1000 }, "capabilities": {"bounding": ["CAP_NET_BIND_SERVICE"],
"effective": ["CAP_NET_BIND_SERVICE"],
"permitted": ["CAP_NET_BIND_SERVICE"]
},
"noNewPrivileges": true,
"rlimits": [ { "type": "RLIMIT_NOFILE", "hard": 65535, "soft": 65535 } ]},
"root": { "path": "rootfs", "readonly": false },"mounts": [
{ "destination": "/proc", "type": "proc", "source": "proc" }, { "destination": "/dev", "type": "tmpfs", "source": "tmpfs", "options": ["mode=755", "size=65536k"] }, { "destination": "/data", "type": "bind", "source": "/var/lib/myapp/data", "options": ["bind", "ro"] }],
"linux": {"namespaces": [
{ "type": "pid" }, { "type": "network" }, { "type": "mount" }, { "type": "uts" }, { "type": "ipc" }, { "type": "user" }],
"cgroupsPath": "system.slice:myapp:abc123",
"resources": { "memory": { "limit": 268435456 }, "cpu": { "shares": 1024, "quota": 50000, "period": 100000 }},
"seccomp": { "defaultAction": "SCMP_ACT_ALLOW", ... }}
}
This is the full description of the container: what to run, which namespaces, which cgroups limits, which capabilities, which seccomp profile.
OCI Distribution: the registry API
The HTTP API of registries. The main endpoints:
GET /v2/ ← ping
GET /v2/<name>/tags/list ← list of tags
GET /v2/<name>/manifests/<reference> ← manifest by tag/digest
GET /v2/<name>/blobs/<digest> ← download a layer/config
POST /v2/<name>/blobs/uploads/ ← start an upload
PUT /v2/<name>/manifests/<reference> ← upload a manifest
<name>is the image name (library/ubuntu,myorg/myapp)<reference>is a tag (v1.0) or a digest (sha256:...)
Any OCI registry (Docker Hub, GHCR, Harbor, Quay, ECR, GCR, ACR, GitLab Registry) implements this API. Pull and push are cross-compatible.
Authentication is a Bearer token, usually through an OAuth2 token server.
# Raw request to a registry
curl -H "Accept: application/vnd.oci.image.manifest.v1+json" \
https://registry-1.docker.io/v2/library/alpine/manifests/latest
skopeo: low-level work with OCI
# Copy an image between registries without local unpacking
skopeo copy docker://registry.example.com/app:v1 \
docker://other-registry.com/app:v1
# Inspect a manifest without a pull
skopeo inspect docker://nginx:latest
# Save as an OCI layout
skopeo copy docker://nginx:latest oci:/tmp/nginx-oci:latest
ls /tmp/nginx-oci/ # the classic OCI structure
Tags vs digest: immutability
- A tag (
nginx:1.25) is a mutable pointer;latestespecially so - A digest (
nginx@sha256:abc...) is immutable, the hash of the manifest
In a production deploy, always pin by digest, not by tag. A tag can be rewritten in the registry; a digest cannot (change the content and the hash changes too).
# Get the digest of the current tag
docker inspect --format='{{index .RepoDigests 0}}' nginx:1.25▸nginx@sha256:abcdef...
# Pin it in a Dockerfile / k8s manifest
FROM nginx@sha256:abcdef...
When things go wrong
manifest unknownon pull: the tag does not exist or was removed from the registry.skopeo list-tags docker://registry/repo.- A multi-arch image was not pulled: the container runtime found no
matching platform manifest.
docker pull --platform=linux/arm64. unauthorized: no token, or it expired.docker login, and check the credentials in~/.docker/config.json/~/.config/containers/auth.json.- The image build is slow every time: no layer caching. The build cache
invalidates when any line above changes; put
RUN apt-get installafterCOPY package*.jsonto reuse the layer. - OCI vs Docker manifest schema: old registries return a v1 manifest, modern ones return v2 OCI. Most clients handle both, but some server-side validators may fail.
- Digest mismatch on air-gapped transfer: after a
gziprepack of a layer its SHA256 changes and the manifest becomes invalid. Useskopeoor save to an OCI layout.
Alternative formats (for the curious)
- AppImage / Snap / Flatpak: for the desktop, not containers in the OCI sense
- Singularity / Apptainer (.sif): scientific clusters, a single-file image
- WASM components: not yet containers in OCI terms, but moving that way (some runtimes run WASM through an OCI config)