What an OCI runtime is
It is the subsystem that takes an OCI bundle ([[oci-spec|spec]]:
config.json + rootfs/) and starts the container. Exactly how it
does that is its own choice; what matters is that it conforms to the
OCI runtime spec.
Three popular options in 2026:
| Runtime | Approach | Trade-off |
|---|---|---|
| runc | namespaces + cgroups + seccomp in the host kernel | maximum performance, minimum isolation |
| runsc (gVisor) | userspace kernel intercepts syscalls | ~30% slower, much more isolation |
| kata-containers | each container in a lightweight VM | ~5% overhead, VM-grade isolation |
| crun | a runc alternative written in C, faster startup | same isolation as runc |
| youki | runc-compatible, written in Rust | same as runc |
runc, the reference
Built by Docker/OCI as a minimal reference. The code is open and ships in every distro. It sits under all the common container stacks (Docker, containerd, CRI-O, podman), either as runc itself or its replacement (crun).
What runc does on runc run myctr:
- Reads
config.json - Creates [[namespaces|namespaces]] (PID, NET, MNT, IPC, UTS, USER)
- Sets up [[cgroups|cgroups]] (memory, cpu)
- Applies capabilities dropping (CAP_DROP)
- Applies a seccomp profile
- Applies an AppArmor/SELinux profile if one is set
chrootintorootfsexecthe command specified in the config
All of this happens in the host kernel. The container sees the host kernel, uses the same VFS, the same scheduler. The isolation comes from namespaces.
Running it directly without Docker:
# Prepare the bundle
mkdir -p mycontainer/rootfs
cd mycontainer
docker export $(docker create alpine) | tar -C rootfs -xf -
runc spec # creates config.json
# edit config.json to suit your needs
# Run
sudo runc run mycontainer-id
# Management
runc list
runc kill mycontainer-id KILL
runc delete mycontainer-id
This is the layer "below Docker". You use it when you want to understand what exactly happens, or for embedded scenarios.
runc, where it sits in the Docker stack
docker / podman
│
▼
containerd (or CRI-O)
│
▼
containerd-shim (one per container, survives a containerd restart)
│
▼
runc (starts the init process, then exits)
│
▼
the container's init process (PID 1 in the pid namespace)
The shim is needed to survive a restart of the higher-level managers. runc is short-lived: it does its job and dies.
crun, the C alternative
Same contract as runc, but:
- Written in C (runc is Go), so startup is faster
- Smaller memory footprint
- Default in podman / RHEL 8+
A full drop-in replacement: a containerd config can switch from runc to crun and everything works.
Use it when you start many short-lived containers (CI, k8s jobs, function-as-a-service).
runsc / gVisor, a userspace kernel
The concept: place a userspace kernel (gVisor's "Sentry") between the application syscall and the host kernel, where it intercepts most syscalls and implements them itself.
app (inside the container)
│ syscall
▼
Sentry (gVisor userspace kernel)
│ a limited subset of host syscalls
▼
host kernel
Pros:
- Not tied to the host kernel for most syscalls, so exploiting a kernel CVE is harder
- Smaller attack surface: ~50 host syscalls instead of ~400
- No VM, so startup is fast (a fraction of a second)
Cons:
- Performance hit, 10-50% depending on the load
- Not all syscalls work, edge networking/file features may not be
supported (
io_uring, for example, only partially) - Not every workload fits, a database with iouring or AIO will suffer
Running it:
# Installation
curl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor ...
apt install runsc
# Register it with Docker
cat /etc/docker/daemon.json
{ "runtimes": { "runsc": { "path": "/usr/bin/runsc" }}
}
systemctl restart docker
# Use it
docker run --runtime=runsc -it alpine
Where it is used:
- Google App Engine / Cloud Run, internally
- Untrusted code execution (online code playgrounds)
- Multi-tenant CI, where a shared cluster runs other people's code
kata-containers, VM-based
Each container runs in a lightweight VM (via qemu/cloud-hypervisor/firecracker). Pros:
- Hardware-grade isolation, a VM boundary, not a namespace boundary
- Compatibility close to 100%, there is a real Linux kernel inside the VM
- Support for GPU passthrough and custom kernels
Cons:
- Overhead in RAM (~50-200 MB per container for the VM)
- Slower startup, 1-2 sec instead of < 100ms
- Nested virtualization is sometimes forbidden in the cloud
# k8s through crio, runtimeClass
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: kata
handler: kata
---
apiVersion: v1
kind: Pod
metadata:
name: secure-pod
spec:
runtimeClassName: kata
containers:
- name: app
image: myapp:v1
Used in:
- AWS Lambda + Firecracker, not Kata itself, but the same idea
- Kata on AKS / Azure Container Instances
- Confidential containers (CoCo), Kata + AMD SEV / Intel TDX for unencrypted-memory protection
Comparison
| Property | runc | runsc / gVisor | kata-containers |
|---|---|---|---|
| Isolation | namespaces | userspace kernel | VM |
| Performance | 100% (baseline) | ~70-90% | ~95% |
| Memory overhead | ~few MB | ~30 MB per Sentry | ~50-200 MB per VM |
| Startup | ~100 ms | ~150 ms | ~1-2 sec |
| Compatibility | 100% | ~85% | ~99% |
| Use case | default everywhere | untrusted code | multi-tenant secure |
| Where default | Docker, containerd, CRI-O, k8s | Google Cloud Run | OCI confidential |
RuntimeClass in k8s
k8s allows multiple runtimes side-by-side:
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gvisor
handler: runsc
---
apiVersion: v1
kind: Pod
spec:
runtimeClassName: gvisor # this pod runs through gVisor
containers: [...]
The default is empty (== runc). Optionally you can force separate namespaces / labels onto the untrusted runtime.
When things go wrong
exec format error, a multi-arch image, the runtime starts a binary for the wrong architecture. Pull the correct platform.OCI runtime exec failed: exec failed, the entrypoint does not exist or is not executable in the rootfs.chmod +xor check the path.- A runsc workload fails with unsupported syscall,
runsc --straceor gVisor'sdmesgwill show which one; sometimes--platform=ptraceis a fallback (slower, broader compatibility). - Kata starts slowly, usually a cold-start of cloud-hypervisor.
Set
enable_template = truein configuration.toml for a prebooted VM. - runc-update does not work on cgroups, cgroupv1 vs v2 have different paths. Modern runc handles both, but containerd may not pass the new format.
Unknown runtimein Docker, it is not registered in/etc/docker/daemon.json, orsystemctl restart dockerwas not run.
Alternatives and related
- firecracker, a VMM, not a runtime, but Kata can use it
- bubblewrap (bwrap), like runc for Flatpak; not OCI-compatible
- lxc/lxd, older, not OCI; more "system contains" than "application contains"
- systemd-nspawn, containerization built into systemd; also not OCI