OCI spec: the container standard: OCI spec

Why OCI

In 2015 Docker agreed to move the container specs out of its own project so the ecosystem would not depend on a single vendor. The result was the Open Container Initiative (under the Linux Foundation), which maintains three separate specifications:

Spec	What it describes
OCI Image	the on-disk image format: layers + manifest + config
OCI Runtime	how the runtime starts a container from rootfs + config.json
OCI Distribution	the registry HTTP API for push/pull

Today "Docker container" is almost a synonym for "OCI container": Dockerfile to image to registry to runtime, each step follows OCI. Alternative runtimes ([[runc-and-runsc|runc/runsc]]), registries (Harbor, GHCR, Quay), and build tools (buildah, kaniko) all work with the same format.

OCI Image: what it is on disk

An image is a set of files on disk, not a tarball. The structure:

myimage/

├── oci-layout                        ← {"imageLayoutVersion": "1.0.0"}

├── index.json                        ← root, points to the manifests

└── blobs/

    └── sha256/

        ├── <hash-config>             ← config (JSON)

        ├── <hash-layer1>             ← layer (tar or tar.gz)

        ├── <hash-layer2>

        └── <hash-manifest>           ← manifest (links config + layers)

index.json

json

  "schemaVersion": 2,

  "manifests": [

      "mediaType": "application/vnd.oci.image.manifest.v1+json",

      "digest": "sha256:abc...",

      "size": 1234,

      "platform": { "architecture": "amd64", "os": "linux" }

},

      "digest": "sha256:def...",

      "platform": { "architecture": "arm64", "os": "linux" }

This is a multi-arch index. One manifest per platform.

Manifest

json

  "schemaVersion": 2,

  "mediaType": "application/vnd.oci.image.manifest.v1+json",

  "config": {

    "mediaType": "application/vnd.oci.image.config.v1+json",

    "digest": "sha256:abc...",

    "size": 7000

},

  "layers": [

    { "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",

      "digest": "sha256:111...", "size": 5000000 },

    { "digest": "sha256:222...", "size": 1000000 },

    { "digest": "sha256:333...", "size":   50000 }

A manifest contains:

config: environment, entrypoint, ENV, USER, WORKDIR
layers: an ordered list of tarballs

Config

json

  "architecture": "amd64",

  "os": "linux",

  "config": {

    "User": "1000:1000",

    "Env": ["PATH=/usr/bin:/bin"],

    "Entrypoint": ["/app/server"],

    "Cmd": ["--port=8080"],

    "WorkingDir": "/app",

    "ExposedPorts": { "8080/tcp": {} }

},

  "rootfs": {

    "type": "layers",

    "diff_ids": [

      "sha256:layer1-uncompressed-hash",

      "sha256:layer2-uncompressed-hash"

},

  "history": [ ... ]

Layers: the basis of image deduplication

Each layer is a diff against the previous one: added or changed files as a tar archive. Deleted files are .wh.<filename> whiteout markers.

On deploy the registry downloads only the missing layers. If 100 images are based on the same ubuntu:22.04, the Ubuntu layer is stored once. The savings in client and registry storage are enormous.

Layers are applied through [[tmpfs-overlayfs|overlayfs]]: a lower stack of read-only layers plus an upper layer for container writes.

Build vs pull

bash

# Build from a Dockerfile

docker build -t myimage:v1 .

# Push to a registry

docker push registry.example.com/myimage:v1

# Pull

docker pull registry.example.com/myimage:v1

# Without Docker, buildah / podman

buildah bud -t myimage:v1 .

buildah push myimage:v1 docker://registry.example.com/myimage:v1

buildah and podman need no daemon, run like ordinary CLI tools, and write OCI-compatible images.

OCI Runtime: config.json + rootfs

The runtime takes a bundle: a directory with

bundle/

├── config.json           ← everything about the container (mounts, namespaces, args)

└── rootfs/               ← extracted layers, the ready FS tree

    ├── bin/

    ├── etc/

    └── usr/

and starts the container. This is not an image, it is the unpacked image plus the runtime config. The image must be "unpacked" into a bundle before the runtime can work with it (the runtime supervisor does this: containerd, CRI-O).

config.json: what is inside

json

  "ociVersion": "1.2.0",

  "process": {

    "args": ["/app/server", "--port=8080"],

    "cwd": "/app",

    "env": ["PATH=/usr/bin:/bin"],

    "user": { "uid": 1000, "gid": 1000 },

    "capabilities": {

      "bounding": ["CAP_NET_BIND_SERVICE"],

      "effective": ["CAP_NET_BIND_SERVICE"],

      "permitted": ["CAP_NET_BIND_SERVICE"]

},

    "noNewPrivileges": true,

    "rlimits": [ { "type": "RLIMIT_NOFILE", "hard": 65535, "soft": 65535 } ]

},

  "root": { "path": "rootfs", "readonly": false },

  "mounts": [

    { "destination": "/proc", "type": "proc", "source": "proc" },

    { "destination": "/dev", "type": "tmpfs", "source": "tmpfs", "options": ["mode=755", "size=65536k"] },

    { "destination": "/data", "type": "bind", "source": "/var/lib/myapp/data", "options": ["bind", "ro"] }

],

  "linux": {

    "namespaces": [

      { "type": "pid" },

      { "type": "network" },

      { "type": "mount" },

      { "type": "uts" },

      { "type": "ipc" },

      { "type": "user" }

],

    "cgroupsPath": "system.slice:myapp:abc123",

    "resources": {

      "memory": { "limit": 268435456 },

      "cpu":    { "shares": 1024, "quota": 50000, "period": 100000 }

},

    "seccomp": { "defaultAction": "SCMP_ACT_ALLOW", ... }

This is the full description of the container: what to run, which namespaces, which cgroups limits, which capabilities, which seccomp profile.

OCI Distribution: the registry API

The HTTP API of registries. The main endpoints:

GET  /v2/                                   ← ping

GET  /v2/<name>/tags/list                   ← list of tags

GET  /v2/<name>/manifests/<reference>        ← manifest by tag/digest

GET  /v2/<name>/blobs/<digest>               ← download a layer/config

POST /v2/<name>/blobs/uploads/               ← start an upload

PUT  /v2/<name>/manifests/<reference>        ← upload a manifest

<name> is the image name (library/ubuntu, myorg/myapp)
<reference> is a tag (v1.0) or a digest (sha256:...)

Any OCI registry (Docker Hub, GHCR, Harbor, Quay, ECR, GCR, ACR, GitLab Registry) implements this API. Pull and push are cross-compatible.

Authentication is a Bearer token, usually through an OAuth2 token server.

bash

# Raw request to a registry

curl -H "Accept: application/vnd.oci.image.manifest.v1+json" \

     https://registry-1.docker.io/v2/library/alpine/manifests/latest

skopeo: low-level work with OCI

bash

# Copy an image between registries without local unpacking

skopeo copy docker://registry.example.com/app:v1 \

            docker://other-registry.com/app:v1

# Inspect a manifest without a pull

skopeo inspect docker://nginx:latest

# Save as an OCI layout

skopeo copy docker://nginx:latest oci:/tmp/nginx-oci:latest

ls /tmp/nginx-oci/                          # the classic OCI structure

Tags vs digest: immutability

A tag (nginx:1.25) is a mutable pointer; latest especially so
A digest (nginx@sha256:abc...) is immutable, the hash of the manifest

In a production deploy, always pin by digest, not by tag. A tag can be rewritten in the registry; a digest cannot (change the content and the hash changes too).

bash

# Get the digest of the current tag

docker inspect --format='{{index .RepoDigests 0}}' nginx:1.25

▸nginx@sha256:abcdef...

# Pin it in a Dockerfile / k8s manifest

FROM nginx@sha256:abcdef...

When things go wrong

manifest unknown on pull: the tag does not exist or was removed from the registry. skopeo list-tags docker://registry/repo.
A multi-arch image was not pulled: the container runtime found no matching platform manifest. docker pull --platform=linux/arm64.
unauthorized: no token, or it expired. docker login, and check the credentials in ~/.docker/config.json / ~/.config/containers/auth.json.
The image build is slow every time: no layer caching. The build cache invalidates when any line above changes; put RUN apt-get install after COPY package*.json to reuse the layer.
OCI vs Docker manifest schema: old registries return a v1 manifest, modern ones return v2 OCI. Most clients handle both, but some server-side validators may fail.
Digest mismatch on air-gapped transfer: after a gzip repack of a layer its SHA256 changes and the manifest becomes invalid. Use skopeo or save to an OCI layout.

Alternative formats (for the curious)

AppImage / Snap / Flatpak: for the desktop, not containers in the OCI sense
Singularity / Apptainer (.sif): scientific clusters, a single-file image
WASM components: not yet containers in OCI terms, but moving that way (some runtimes run WASM through an OCI config)

Why OCI

Spec	What it describes
OCI Image	the on-disk image format: layers + manifest + config
OCI Runtime	how the runtime starts a container from rootfs + config.json
OCI Distribution	the registry HTTP API for push/pull

OCI Image: what it is on disk

An image is a set of files on disk, not a tarball. The structure:

myimage/

├── oci-layout                        ← {"imageLayoutVersion": "1.0.0"}

├── index.json                        ← root, points to the manifests

└── blobs/

    └── sha256/

        ├── <hash-config>             ← config (JSON)

        ├── <hash-layer1>             ← layer (tar or tar.gz)

        ├── <hash-layer2>

        └── <hash-manifest>           ← manifest (links config + layers)

index.json

json

  "schemaVersion": 2,

  "manifests": [

      "mediaType": "application/vnd.oci.image.manifest.v1+json",

      "digest": "sha256:abc...",

      "size": 1234,

      "platform": { "architecture": "amd64", "os": "linux" }

},

      "digest": "sha256:def...",

      "platform": { "architecture": "arm64", "os": "linux" }

This is a multi-arch index. One manifest per platform.

Manifest

json

  "schemaVersion": 2,

  "mediaType": "application/vnd.oci.image.manifest.v1+json",

  "config": {

    "mediaType": "application/vnd.oci.image.config.v1+json",

    "digest": "sha256:abc...",

    "size": 7000

},

  "layers": [

    { "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",

      "digest": "sha256:111...", "size": 5000000 },

    { "digest": "sha256:222...", "size": 1000000 },

    { "digest": "sha256:333...", "size":   50000 }

A manifest contains:

config: environment, entrypoint, ENV, USER, WORKDIR
layers: an ordered list of tarballs

Config

json

  "architecture": "amd64",

  "os": "linux",

  "config": {

    "User": "1000:1000",

    "Env": ["PATH=/usr/bin:/bin"],

    "Entrypoint": ["/app/server"],

    "Cmd": ["--port=8080"],

    "WorkingDir": "/app",

    "ExposedPorts": { "8080/tcp": {} }

},

  "rootfs": {

    "type": "layers",

    "diff_ids": [

      "sha256:layer1-uncompressed-hash",

      "sha256:layer2-uncompressed-hash"

},

  "history": [ ... ]

Layers: the basis of image deduplication

Each layer is a diff against the previous one: added or changed files as a tar archive. Deleted files are .wh.<filename> whiteout markers.

Layers are applied through [[tmpfs-overlayfs|overlayfs]]: a lower stack of read-only layers plus an upper layer for container writes.

Build vs pull

bash

# Build from a Dockerfile

docker build -t myimage:v1 .

# Push to a registry

docker push registry.example.com/myimage:v1

# Pull

docker pull registry.example.com/myimage:v1

# Without Docker, buildah / podman

buildah bud -t myimage:v1 .

buildah push myimage:v1 docker://registry.example.com/myimage:v1

buildah and podman need no daemon, run like ordinary CLI tools, and write OCI-compatible images.

OCI Runtime: config.json + rootfs

The runtime takes a bundle: a directory with

bundle/

├── config.json           ← everything about the container (mounts, namespaces, args)

└── rootfs/               ← extracted layers, the ready FS tree

    ├── bin/

    ├── etc/

    └── usr/

config.json: what is inside

json

  "ociVersion": "1.2.0",

  "process": {

    "args": ["/app/server", "--port=8080"],

    "cwd": "/app",

    "env": ["PATH=/usr/bin:/bin"],

    "user": { "uid": 1000, "gid": 1000 },

    "capabilities": {

      "bounding": ["CAP_NET_BIND_SERVICE"],

      "effective": ["CAP_NET_BIND_SERVICE"],

      "permitted": ["CAP_NET_BIND_SERVICE"]

},

    "noNewPrivileges": true,

    "rlimits": [ { "type": "RLIMIT_NOFILE", "hard": 65535, "soft": 65535 } ]

},

  "root": { "path": "rootfs", "readonly": false },

  "mounts": [

    { "destination": "/proc", "type": "proc", "source": "proc" },

    { "destination": "/dev", "type": "tmpfs", "source": "tmpfs", "options": ["mode=755", "size=65536k"] },

    { "destination": "/data", "type": "bind", "source": "/var/lib/myapp/data", "options": ["bind", "ro"] }

],

  "linux": {

    "namespaces": [

      { "type": "pid" },

      { "type": "network" },

      { "type": "mount" },

      { "type": "uts" },

      { "type": "ipc" },

      { "type": "user" }

],

    "cgroupsPath": "system.slice:myapp:abc123",

    "resources": {

      "memory": { "limit": 268435456 },

      "cpu":    { "shares": 1024, "quota": 50000, "period": 100000 }

},

    "seccomp": { "defaultAction": "SCMP_ACT_ALLOW", ... }

This is the full description of the container: what to run, which namespaces, which cgroups limits, which capabilities, which seccomp profile.

OCI Distribution: the registry API

The HTTP API of registries. The main endpoints:

GET  /v2/                                   ← ping

GET  /v2/<name>/tags/list                   ← list of tags

GET  /v2/<name>/manifests/<reference>        ← manifest by tag/digest

GET  /v2/<name>/blobs/<digest>               ← download a layer/config

POST /v2/<name>/blobs/uploads/               ← start an upload

PUT  /v2/<name>/manifests/<reference>        ← upload a manifest

<name> is the image name (library/ubuntu, myorg/myapp)
<reference> is a tag (v1.0) or a digest (sha256:...)

Any OCI registry (Docker Hub, GHCR, Harbor, Quay, ECR, GCR, ACR, GitLab Registry) implements this API. Pull and push are cross-compatible.

Authentication is a Bearer token, usually through an OAuth2 token server.

bash

# Raw request to a registry

curl -H "Accept: application/vnd.oci.image.manifest.v1+json" \

     https://registry-1.docker.io/v2/library/alpine/manifests/latest

skopeo: low-level work with OCI

bash

# Copy an image between registries without local unpacking

skopeo copy docker://registry.example.com/app:v1 \

            docker://other-registry.com/app:v1

# Inspect a manifest without a pull

skopeo inspect docker://nginx:latest

# Save as an OCI layout

skopeo copy docker://nginx:latest oci:/tmp/nginx-oci:latest

ls /tmp/nginx-oci/                          # the classic OCI structure

Tags vs digest: immutability

A tag (nginx:1.25) is a mutable pointer; latest especially so
A digest (nginx@sha256:abc...) is immutable, the hash of the manifest

In a production deploy, always pin by digest, not by tag. A tag can be rewritten in the registry; a digest cannot (change the content and the hash changes too).

bash

# Get the digest of the current tag

docker inspect --format='{{index .RepoDigests 0}}' nginx:1.25

▸nginx@sha256:abcdef...

# Pin it in a Dockerfile / k8s manifest

FROM nginx@sha256:abcdef...

When things go wrong

manifest unknown on pull: the tag does not exist or was removed from the registry. skopeo list-tags docker://registry/repo.
A multi-arch image was not pulled: the container runtime found no matching platform manifest. docker pull --platform=linux/arm64.
unauthorized: no token, or it expired. docker login, and check the credentials in ~/.docker/config.json / ~/.config/containers/auth.json.
The image build is slow every time: no layer caching. The build cache invalidates when any line above changes; put RUN apt-get install after COPY package*.json to reuse the layer.
OCI vs Docker manifest schema: old registries return a v1 manifest, modern ones return v2 OCI. Most clients handle both, but some server-side validators may fail.
Digest mismatch on air-gapped transfer: after a gzip repack of a layer its SHA256 changes and the manifest becomes invalid. Use skopeo or save to an OCI layout.

Alternative formats (for the curious)

AppImage / Snap / Flatpak: for the desktop, not containers in the OCI sense
Singularity / Apptainer (.sif): scientific clusters, a single-file image
WASM components: not yet containers in OCI terms, but moving that way (some runtimes run WASM through an OCI config)

OCI spec: the container standard

Why OCI

OCI Image: what it is on disk

index.json

Manifest

Config

Layers: the basis of image deduplication

Build vs pull

OCI Runtime: config.json + rootfs

config.json: what is inside

OCI Distribution: the registry API

skopeo: low-level work with OCI

Tags vs digest: immutability

When things go wrong

Alternative formats (for the curious)

§ команды

§ см. также

OCI spec: the container standard

Why OCI

OCI Image: what it is on disk

index.json

Manifest

Config

Layers: the basis of image deduplication

Build vs pull

OCI Runtime: config.json + rootfs

config.json: what is inside

OCI Distribution: the registry API

skopeo: low-level work with OCI

Tags vs digest: immutability

When things go wrong

Alternative formats (for the curious)

§ команды

§ см. также