runc, runsc, kata: container runtimes: runc

What an OCI runtime is

It is the subsystem that takes an OCI bundle ([[oci-spec|spec]]: config.json + rootfs/) and starts the container. Exactly how it does that is its own choice; what matters is that it conforms to the OCI runtime spec.

Three popular options in 2026:

Runtime	Approach	Trade-off
runc	namespaces + cgroups + seccomp in the host kernel	maximum performance, minimum isolation
runsc (gVisor)	userspace kernel intercepts syscalls	~30% slower, much more isolation
kata-containers	each container in a lightweight VM	~5% overhead, VM-grade isolation
crun	a runc alternative written in C, faster startup	same isolation as runc
youki	runc-compatible, written in Rust	same as runc

runc, the reference

Built by Docker/OCI as a minimal reference. The code is open and ships in every distro. It sits under all the common container stacks (Docker, containerd, CRI-O, podman), either as runc itself or its replacement (crun).

What runc does on runc run myctr:

Reads config.json
Creates [[namespaces|namespaces]] (PID, NET, MNT, IPC, UTS, USER)
Sets up [[cgroups|cgroups]] (memory, cpu)
Applies capabilities dropping (CAP_DROP)
Applies a seccomp profile
Applies an AppArmor/SELinux profile if one is set
chroot into rootfs
exec the command specified in the config

All of this happens in the host kernel. The container sees the host kernel, uses the same VFS, the same scheduler. The isolation comes from namespaces.

Running it directly without Docker:

bash

# Prepare the bundle

mkdir -p mycontainer/rootfs

cd mycontainer

docker export $(docker create alpine) | tar -C rootfs -xf -

runc spec                                  # creates config.json

# edit config.json to suit your needs

# Run

sudo runc run mycontainer-id

# Management

runc list

runc kill mycontainer-id KILL

runc delete mycontainer-id

This is the layer "below Docker". You use it when you want to understand what exactly happens, or for embedded scenarios.

runc, where it sits in the Docker stack

docker / podman

│

▼

containerd (or CRI-O)

│

▼

containerd-shim (one per container, survives a containerd restart)

│

▼

runc (starts the init process, then exits)

│

▼

the container's init process (PID 1 in the pid namespace)

The shim is needed to survive a restart of the higher-level managers. runc is short-lived: it does its job and dies.

crun, the C alternative

Same contract as runc, but:

Written in C (runc is Go), so startup is faster
Smaller memory footprint
Default in podman / RHEL 8+

A full drop-in replacement: a containerd config can switch from runc to crun and everything works.

Use it when you start many short-lived containers (CI, k8s jobs, function-as-a-service).

runsc / gVisor, a userspace kernel

The concept: place a userspace kernel (gVisor's "Sentry") between the application syscall and the host kernel, where it intercepts most syscalls and implements them itself.

app (inside the container)

      │ syscall

▼

Sentry (gVisor userspace kernel)

      │ a limited subset of host syscalls

▼

host kernel

Pros:

Not tied to the host kernel for most syscalls, so exploiting a kernel CVE is harder
Smaller attack surface: ~50 host syscalls instead of ~400
No VM, so startup is fast (a fraction of a second)

Cons:

Performance hit, 10-50% depending on the load
Not all syscalls work, edge networking/file features may not be supported (io_uring, for example, only partially)
Not every workload fits, a database with iouring or AIO will suffer

Running it:

bash

# Installation

curl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor ...

apt install runsc

# Register it with Docker

cat /etc/docker/daemon.json

  "runtimes": {

    "runsc": { "path": "/usr/bin/runsc" }

systemctl restart docker

# Use it

docker run --runtime=runsc -it alpine

Where it is used:

Google App Engine / Cloud Run, internally
Untrusted code execution (online code playgrounds)
Multi-tenant CI, where a shared cluster runs other people's code

kata-containers, VM-based

Each container runs in a lightweight VM (via qemu/cloud-hypervisor/firecracker). Pros:

Hardware-grade isolation, a VM boundary, not a namespace boundary
Compatibility close to 100%, there is a real Linux kernel inside the VM
Support for GPU passthrough and custom kernels

Cons:

Overhead in RAM (~50-200 MB per container for the VM)
Slower startup, 1-2 sec instead of < 100ms
Nested virtualization is sometimes forbidden in the cloud

bash

# k8s through crio, runtimeClass

apiVersion: node.k8s.io/v1

kind: RuntimeClass

metadata:

  name: kata

handler: kata

---

apiVersion: v1

kind: Pod

metadata:

  name: secure-pod

spec:

  runtimeClassName: kata

  containers:

  - name: app

    image: myapp:v1

Used in:

AWS Lambda + Firecracker, not Kata itself, but the same idea
Kata on AKS / Azure Container Instances
Confidential containers (CoCo), Kata + AMD SEV / Intel TDX for unencrypted-memory protection

Comparison

Property	runc	runsc / gVisor	kata-containers
Isolation	namespaces	userspace kernel	VM
Performance	100% (baseline)	~70-90%	~95%
Memory overhead	~few MB	~30 MB per Sentry	~50-200 MB per VM
Startup	~100 ms	~150 ms	~1-2 sec
Compatibility	100%	~85%	~99%
Use case	default everywhere	untrusted code	multi-tenant secure
Where default	Docker, containerd, CRI-O, k8s	Google Cloud Run	OCI confidential

RuntimeClass in k8s

k8s allows multiple runtimes side-by-side:

yaml

apiVersion: node.k8s.io/v1

kind: RuntimeClass

metadata:

  name: gvisor

handler: runsc

---

apiVersion: v1

kind: Pod

spec:

  runtimeClassName: gvisor             # this pod runs through gVisor

  containers: [...]

The default is empty (== runc). Optionally you can force separate namespaces / labels onto the untrusted runtime.

When things go wrong

exec format error, a multi-arch image, the runtime starts a binary for the wrong architecture. Pull the correct platform.
OCI runtime exec failed: exec failed, the entrypoint does not exist or is not executable in the rootfs. chmod +x or check the path.
A runsc workload fails with unsupported syscall, runsc --strace or gVisor's dmesg will show which one; sometimes --platform=ptrace is a fallback (slower, broader compatibility).
Kata starts slowly, usually a cold-start of cloud-hypervisor. Set enable_template = true in configuration.toml for a prebooted VM.
runc-update does not work on cgroups, cgroupv1 vs v2 have different paths. Modern runc handles both, but containerd may not pass the new format.
Unknown runtime in Docker, it is not registered in /etc/docker/daemon.json, or systemctl restart docker was not run.

Alternatives and related

firecracker, a VMM, not a runtime, but Kata can use it
bubblewrap (bwrap), like runc for Flatpak; not OCI-compatible
lxc/lxd, older, not OCI; more "system contains" than "application contains"
systemd-nspawn, containerization built into systemd; also not OCI

What an OCI runtime is

Three popular options in 2026:

Runtime	Approach	Trade-off
runc	namespaces + cgroups + seccomp in the host kernel	maximum performance, minimum isolation
runsc (gVisor)	userspace kernel intercepts syscalls	~30% slower, much more isolation
kata-containers	each container in a lightweight VM	~5% overhead, VM-grade isolation
crun	a runc alternative written in C, faster startup	same isolation as runc
youki	runc-compatible, written in Rust	same as runc

runc, the reference

What runc does on runc run myctr:

Reads config.json
Creates [[namespaces|namespaces]] (PID, NET, MNT, IPC, UTS, USER)
Sets up [[cgroups|cgroups]] (memory, cpu)
Applies capabilities dropping (CAP_DROP)
Applies a seccomp profile
Applies an AppArmor/SELinux profile if one is set
chroot into rootfs
exec the command specified in the config

All of this happens in the host kernel. The container sees the host kernel, uses the same VFS, the same scheduler. The isolation comes from namespaces.

Running it directly without Docker:

bash

# Prepare the bundle

mkdir -p mycontainer/rootfs

cd mycontainer

docker export $(docker create alpine) | tar -C rootfs -xf -

runc spec                                  # creates config.json

# edit config.json to suit your needs

# Run

sudo runc run mycontainer-id

# Management

runc list

runc kill mycontainer-id KILL

runc delete mycontainer-id

This is the layer "below Docker". You use it when you want to understand what exactly happens, or for embedded scenarios.

runc, where it sits in the Docker stack

docker / podman

│

▼

containerd (or CRI-O)

│

▼

containerd-shim (one per container, survives a containerd restart)

│

▼

runc (starts the init process, then exits)

│

▼

the container's init process (PID 1 in the pid namespace)

The shim is needed to survive a restart of the higher-level managers. runc is short-lived: it does its job and dies.

crun, the C alternative

Same contract as runc, but:

Written in C (runc is Go), so startup is faster
Smaller memory footprint
Default in podman / RHEL 8+

A full drop-in replacement: a containerd config can switch from runc to crun and everything works.

Use it when you start many short-lived containers (CI, k8s jobs, function-as-a-service).

runsc / gVisor, a userspace kernel

The concept: place a userspace kernel (gVisor's "Sentry") between the application syscall and the host kernel, where it intercepts most syscalls and implements them itself.

app (inside the container)

      │ syscall

▼

Sentry (gVisor userspace kernel)

      │ a limited subset of host syscalls

▼

host kernel

Pros:

Not tied to the host kernel for most syscalls, so exploiting a kernel CVE is harder
Smaller attack surface: ~50 host syscalls instead of ~400
No VM, so startup is fast (a fraction of a second)

Cons:

Performance hit, 10-50% depending on the load
Not all syscalls work, edge networking/file features may not be supported (io_uring, for example, only partially)
Not every workload fits, a database with iouring or AIO will suffer

Running it:

bash

# Installation

curl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor ...

apt install runsc

# Register it with Docker

cat /etc/docker/daemon.json

  "runtimes": {

    "runsc": { "path": "/usr/bin/runsc" }

systemctl restart docker

# Use it

docker run --runtime=runsc -it alpine

Where it is used:

Google App Engine / Cloud Run, internally
Untrusted code execution (online code playgrounds)
Multi-tenant CI, where a shared cluster runs other people's code

kata-containers, VM-based

Each container runs in a lightweight VM (via qemu/cloud-hypervisor/firecracker). Pros:

Hardware-grade isolation, a VM boundary, not a namespace boundary
Compatibility close to 100%, there is a real Linux kernel inside the VM
Support for GPU passthrough and custom kernels

Cons:

Overhead in RAM (~50-200 MB per container for the VM)
Slower startup, 1-2 sec instead of < 100ms
Nested virtualization is sometimes forbidden in the cloud

bash

# k8s through crio, runtimeClass

apiVersion: node.k8s.io/v1

kind: RuntimeClass

metadata:

  name: kata

handler: kata

---

apiVersion: v1

kind: Pod

metadata:

  name: secure-pod

spec:

  runtimeClassName: kata

  containers:

  - name: app

    image: myapp:v1

Used in:

AWS Lambda + Firecracker, not Kata itself, but the same idea
Kata on AKS / Azure Container Instances
Confidential containers (CoCo), Kata + AMD SEV / Intel TDX for unencrypted-memory protection

Comparison

Property	runc	runsc / gVisor	kata-containers
Isolation	namespaces	userspace kernel	VM
Performance	100% (baseline)	~70-90%	~95%
Memory overhead	~few MB	~30 MB per Sentry	~50-200 MB per VM
Startup	~100 ms	~150 ms	~1-2 sec
Compatibility	100%	~85%	~99%
Use case	default everywhere	untrusted code	multi-tenant secure
Where default	Docker, containerd, CRI-O, k8s	Google Cloud Run	OCI confidential

RuntimeClass in k8s

k8s allows multiple runtimes side-by-side:

yaml

apiVersion: node.k8s.io/v1

kind: RuntimeClass

metadata:

  name: gvisor

handler: runsc

---

apiVersion: v1

kind: Pod

spec:

  runtimeClassName: gvisor             # this pod runs through gVisor

  containers: [...]

The default is empty (== runc). Optionally you can force separate namespaces / labels onto the untrusted runtime.

When things go wrong

exec format error, a multi-arch image, the runtime starts a binary for the wrong architecture. Pull the correct platform.
OCI runtime exec failed: exec failed, the entrypoint does not exist or is not executable in the rootfs. chmod +x or check the path.
A runsc workload fails with unsupported syscall, runsc --strace or gVisor's dmesg will show which one; sometimes --platform=ptrace is a fallback (slower, broader compatibility).
Kata starts slowly, usually a cold-start of cloud-hypervisor. Set enable_template = true in configuration.toml for a prebooted VM.
runc-update does not work on cgroups, cgroupv1 vs v2 have different paths. Modern runc handles both, but containerd may not pass the new format.
Unknown runtime in Docker, it is not registered in /etc/docker/daemon.json, or systemctl restart docker was not run.

Alternatives and related

firecracker, a VMM, not a runtime, but Kata can use it
bubblewrap (bwrap), like runc for Flatpak; not OCI-compatible
lxc/lxd, older, not OCI; more "system contains" than "application contains"
systemd-nspawn, containerization built into systemd; also not OCI

runc, runsc, kata: container runtimes

What an OCI runtime is

runc, the reference

runc, where it sits in the Docker stack

crun, the C alternative

runsc / gVisor, a userspace kernel

kata-containers, VM-based

Comparison

RuntimeClass in k8s

When things go wrong

Alternatives and related

§ команды

§ см. также

runc, runsc, kata: container runtimes

What an OCI runtime is

runc, the reference

runc, where it sits in the Docker stack

crun, the C alternative

runsc / gVisor, a userspace kernel

kata-containers, VM-based

Comparison

RuntimeClass in k8s

When things go wrong

Alternatives and related

§ команды

§ см. также