Kubernetes pod lifecycle: from Pending to Terminated

Why understand the lifecycle

In k8s a pod is the atom of a deployment: one or more containers with a shared network/IPC namespace and volumes. Between the moment of kubectl apply and the final Running state, a pod passes through a chain of states, and any of them can fail. Once you know the lifecycle, you can diagnose "why it won't start" / "why it keeps restarting" / "why it can't shut down gracefully" in a minute.

Phases, the `.status.phase` field

A pod has exactly one phase:

Phase	What it means
Pending	accepted by the API, not Running yet. Could be scheduling, image pull, or init containers.
Running	scheduled on a node, at least one container is started. Does not mean all of them pass readiness.
Succeeded	all containers finished with exit 0 (for Job/CronJob)
Failed	at least one container finished with a non-zero code and will not be restarted
Unknown	kubelet could not report the status (node crashed, network split)

The phase is too coarse. The real state lives in .status.conditions.

Conditions, the detailed picture

yaml

status:

  phase: Running

  conditions:

  - type: PodScheduled

    status: "True"

    lastTransitionTime: "..."

  - type: Initialized

    status: "True"           # all init containers OK

  - type: ContainersReady

    status: "True"           # all main containers ready

  - type: Ready

    status: "True"           # = ContainersReady && readiness probe pass

PodScheduled means the scheduler found a node
Initialized means the init containers finished with exit 0
ContainersReady means all main containers passed the readiness probe
Ready means the pod can receive traffic (it is added to Service endpoints)

A pod can be Running but not Ready, which is normal during startup or after a liveness probe failure.

Init containers

Containers in spec.initContainers run sequentially BEFORE the main ones, and each must exit 0:

yaml

apiVersion: v1

kind: Pod

spec:

  initContainers:

  - name: wait-for-db

    image: busybox

    command: ['sh', '-c', 'until nc -z db 5432; do sleep 1; done']

  - name: migrate

    image: myapp:v1

    command: ['./migrate']

  containers:

  - name: app

    image: myapp:v1

Semantics:

If an init container fails, the pod restarts from the first init container
restartPolicy: Always for the pod does not apply to init (init always restarts on failure)
Init containers can have their own resources and securityContext

Use cases:

Wait for dependencies (DB, configmap)
DB migrations
Permissions fix on a mount (chown /data before the main process)
Generate config from templates with envsubst

Probes: startup, readiness, liveness

There are three types of health check, run by the kubelet on the node:

startup probe

yaml

startupProbe:

  httpGet: { path: /healthz, port: 8080 }

  failureThreshold: 30

  periodSeconds: 10

Checked first, and disables the other probes while it runs.
Protects slow-starting applications from a false liveness failure.
On success it no longer runs, and readiness/liveness start working.
On the failure threshold the kubelet kills the container.

Without a startup probe, a slow start means the liveness probe kills the container, producing an endless restart loop.

readiness probe

yaml

readinessProbe:

  httpGet: { path: /ready, port: 8080 }

  initialDelaySeconds: 5

  periodSeconds: 10

  failureThreshold: 3

Does not kill the pod, it just removes the endpoint from the Service on failure.
Returns to the Service once it passes again.
Ideal for warmup (cache not yet warm, DB connection not yet established).

liveness probe

yaml

livenessProbe:

  httpGet: { path: /healthz, port: 8080 }

  periodSeconds: 10

  failureThreshold: 3

Kills the container on failure (restarts it via restartPolicy).
Use it ONLY to self-heal a "deadlocked" application. Not for checking upstream dependencies.
Anti-pattern: checking the DB in liveness. If the DB lags, k8s will kill all the pods, causing a cascading failure.

Probe types

Type	What
`httpGet`	HTTP 200-399 = OK; on any path/port
`tcpSocket`	TCP connect possible = OK
`exec`	command exit 0 = OK
`grpc` (1.27+)	gRPC health check stream

Restart Policy

Always (default for Deployment/ReplicaSet/StatefulSet), restarts on any exit
OnFailure (Job/CronJob default), restarts only on a non-zero exit
Never, does not restart at all

This is for the Pod level only. At the controller level (Deployment) the logic is different (replicas).

Termination, graceful shutdown

On kubectl delete pod or a scale-down:

1. API: pod.metadata.deletionTimestamp = now

2. Pod removed from Service endpoints (Ready=False)

3. preStop hook (if any)                 -> synchronous

4. SIGTERM to pid 1 of the container     -> grace period starts

5. ... wait terminationGracePeriodSeconds (default 30s)

6. SIGKILL if not finished

7. Pod removed from the API

yaml

spec:

  terminationGracePeriodSeconds: 60

  containers:

  - name: app

    lifecycle:

      preStop:

        exec:

          command: ['sh', '-c', 'sleep 5 && kill -TERM 1']

preStop:

Runs before SIGTERM
Useful for drain mode (tell the LB "stop sending me traffic")
sleep 5 is a typical pause for the distribution of the removal across the Service

Common mistakes:

PID 1 does not handle SIGTERM. Some languages and shell scripts do not forward the signal. Use tini or dumb-init as PID 1.
A long graceful operation > terminationGracePeriodSeconds. k8s sends SIGKILL. Increase the grace period, or make the work non-blocking.

OOM in a pod

k8s sets a cgroup limit on memory. On overflow:

The container is OOMKilled. The kernel OOM killer kills the container process
In status: lastState.terminated.reason: OOMKilled
The pod restarts if restartPolicy=Always

Check:

bash

kubectl describe pod mypod | grep -A2 'Last State'

# Last State:     Terminated

# Reason:        OOMKilled

# Exit Code:     137                    ← 128 + 9 (SIGKILL)

Fix: increase resources.limits.memory or fix the leak in the application.

ImagePullBackOff and CrashLoopBackOff

ImagePullBackOff means the registry is unreachable / the image tag is wrong / there is no imagePullSecret. kubectl describe pod shows events.
CrashLoopBackOff means the container crashes quickly after start, and the kubelet increases the delay exponentially between restarts (10s, 20s, 40s, up to 5 min).

Debug:

bash

kubectl logs mypod -c container1                # current

kubectl logs mypod -c container1 --previous     # previous instance

kubectl describe pod mypod                      # events at the bottom

kubectl get events --sort-by='.lastTimestamp'   # global events

When things go wrong

Pod stuck in Pending means there is no node with the required resources/labels. kubectl describe pod shows events: FailedScheduling. Check kubectl describe node (Allocatable, taints).
Pod stuck in ContainerCreating means image pulls take too long or the volume does not mount. Check kubectl describe pod events.
OOMKilled with no obvious cause means the limits are too low, or the JVM is not aware of the cgroup limit (you need -XX:+UseContainerSupport for Java 8u131+, default on Java 11+).
The liveness probe kills the pod after a deploy means initialDelaySeconds is too small and the application is still starting. Use a startup probe.
kubectl delete pod hangs means finalizers or a very large terminationGracePeriodSeconds. Use kubectl delete pod --grace-period=0 --force.
The container is alive but gets no traffic means the readiness probe failed. kubectl describe pod shows conditions: ContainersReady=False.
An init container finishes but the pod does not move means a non-zero exit. Check kubectl logs mypod -c <init-name>.

Useful kubectl commands

bash

kubectl get pod -o wide                       # find the node + IP

kubectl get pod -o jsonpath='{.status.containerStatuses[*].state}'

kubectl exec -it mypod -- sh

kubectl debug -it mypod --image=busybox       # ephemeral container 1.25+

kubectl port-forward pod/mypod 8080:8080      # local test without a Service

kubectl rollout status deployment/myapp

kubectl rollout restart deployment/myapp      # rolling restart without editing the spec

Why understand the lifecycle

Phases, the `.status.phase` field

A pod has exactly one phase:

Phase	What it means
Pending	accepted by the API, not Running yet. Could be scheduling, image pull, or init containers.
Running	scheduled on a node, at least one container is started. Does not mean all of them pass readiness.
Succeeded	all containers finished with exit 0 (for Job/CronJob)
Failed	at least one container finished with a non-zero code and will not be restarted
Unknown	kubelet could not report the status (node crashed, network split)

The phase is too coarse. The real state lives in .status.conditions.

Conditions, the detailed picture

yaml

status:

  phase: Running

  conditions:

  - type: PodScheduled

    status: "True"

    lastTransitionTime: "..."

  - type: Initialized

    status: "True"           # all init containers OK

  - type: ContainersReady

    status: "True"           # all main containers ready

  - type: Ready

    status: "True"           # = ContainersReady && readiness probe pass

PodScheduled means the scheduler found a node
Initialized means the init containers finished with exit 0
ContainersReady means all main containers passed the readiness probe
Ready means the pod can receive traffic (it is added to Service endpoints)

A pod can be Running but not Ready, which is normal during startup or after a liveness probe failure.

Init containers

Containers in spec.initContainers run sequentially BEFORE the main ones, and each must exit 0:

yaml

apiVersion: v1

kind: Pod

spec:

  initContainers:

  - name: wait-for-db

    image: busybox

    command: ['sh', '-c', 'until nc -z db 5432; do sleep 1; done']

  - name: migrate

    image: myapp:v1

    command: ['./migrate']

  containers:

  - name: app

    image: myapp:v1

Semantics:

If an init container fails, the pod restarts from the first init container
restartPolicy: Always for the pod does not apply to init (init always restarts on failure)
Init containers can have their own resources and securityContext

Use cases:

Wait for dependencies (DB, configmap)
DB migrations
Permissions fix on a mount (chown /data before the main process)
Generate config from templates with envsubst

Probes: startup, readiness, liveness

There are three types of health check, run by the kubelet on the node:

startup probe

yaml

startupProbe:

  httpGet: { path: /healthz, port: 8080 }

  failureThreshold: 30

  periodSeconds: 10

Checked first, and disables the other probes while it runs.
Protects slow-starting applications from a false liveness failure.
On success it no longer runs, and readiness/liveness start working.
On the failure threshold the kubelet kills the container.

Without a startup probe, a slow start means the liveness probe kills the container, producing an endless restart loop.

readiness probe

yaml

readinessProbe:

  httpGet: { path: /ready, port: 8080 }

  initialDelaySeconds: 5

  periodSeconds: 10

  failureThreshold: 3

Does not kill the pod, it just removes the endpoint from the Service on failure.
Returns to the Service once it passes again.
Ideal for warmup (cache not yet warm, DB connection not yet established).

liveness probe

yaml

livenessProbe:

  httpGet: { path: /healthz, port: 8080 }

  periodSeconds: 10

  failureThreshold: 3

Kills the container on failure (restarts it via restartPolicy).
Use it ONLY to self-heal a "deadlocked" application. Not for checking upstream dependencies.
Anti-pattern: checking the DB in liveness. If the DB lags, k8s will kill all the pods, causing a cascading failure.

Probe types

Type	What
`httpGet`	HTTP 200-399 = OK; on any path/port
`tcpSocket`	TCP connect possible = OK
`exec`	command exit 0 = OK
`grpc` (1.27+)	gRPC health check stream

Restart Policy

Always (default for Deployment/ReplicaSet/StatefulSet), restarts on any exit
OnFailure (Job/CronJob default), restarts only on a non-zero exit
Never, does not restart at all

This is for the Pod level only. At the controller level (Deployment) the logic is different (replicas).

Termination, graceful shutdown

On kubectl delete pod or a scale-down:

1. API: pod.metadata.deletionTimestamp = now

2. Pod removed from Service endpoints (Ready=False)

3. preStop hook (if any)                 -> synchronous

4. SIGTERM to pid 1 of the container     -> grace period starts

5. ... wait terminationGracePeriodSeconds (default 30s)

6. SIGKILL if not finished

7. Pod removed from the API

yaml

spec:

  terminationGracePeriodSeconds: 60

  containers:

  - name: app

    lifecycle:

      preStop:

        exec:

          command: ['sh', '-c', 'sleep 5 && kill -TERM 1']

preStop:

Runs before SIGTERM
Useful for drain mode (tell the LB "stop sending me traffic")
sleep 5 is a typical pause for the distribution of the removal across the Service

Common mistakes:

PID 1 does not handle SIGTERM. Some languages and shell scripts do not forward the signal. Use tini or dumb-init as PID 1.
A long graceful operation > terminationGracePeriodSeconds. k8s sends SIGKILL. Increase the grace period, or make the work non-blocking.

OOM in a pod

k8s sets a cgroup limit on memory. On overflow:

The container is OOMKilled. The kernel OOM killer kills the container process
In status: lastState.terminated.reason: OOMKilled
The pod restarts if restartPolicy=Always

Check:

bash

kubectl describe pod mypod | grep -A2 'Last State'

# Last State:     Terminated

# Reason:        OOMKilled

# Exit Code:     137                    ← 128 + 9 (SIGKILL)

Fix: increase resources.limits.memory or fix the leak in the application.

ImagePullBackOff and CrashLoopBackOff

ImagePullBackOff means the registry is unreachable / the image tag is wrong / there is no imagePullSecret. kubectl describe pod shows events.
CrashLoopBackOff means the container crashes quickly after start, and the kubelet increases the delay exponentially between restarts (10s, 20s, 40s, up to 5 min).

Debug:

bash

kubectl logs mypod -c container1                # current

kubectl logs mypod -c container1 --previous     # previous instance

kubectl describe pod mypod                      # events at the bottom

kubectl get events --sort-by='.lastTimestamp'   # global events

When things go wrong

Pod stuck in Pending means there is no node with the required resources/labels. kubectl describe pod shows events: FailedScheduling. Check kubectl describe node (Allocatable, taints).
Pod stuck in ContainerCreating means image pulls take too long or the volume does not mount. Check kubectl describe pod events.
OOMKilled with no obvious cause means the limits are too low, or the JVM is not aware of the cgroup limit (you need -XX:+UseContainerSupport for Java 8u131+, default on Java 11+).
The liveness probe kills the pod after a deploy means initialDelaySeconds is too small and the application is still starting. Use a startup probe.
kubectl delete pod hangs means finalizers or a very large terminationGracePeriodSeconds. Use kubectl delete pod --grace-period=0 --force.
The container is alive but gets no traffic means the readiness probe failed. kubectl describe pod shows conditions: ContainersReady=False.
An init container finishes but the pod does not move means a non-zero exit. Check kubectl logs mypod -c <init-name>.

Useful kubectl commands

bash

kubectl get pod -o wide                       # find the node + IP

kubectl get pod -o jsonpath='{.status.containerStatuses[*].state}'

kubectl exec -it mypod -- sh

kubectl debug -it mypod --image=busybox       # ephemeral container 1.25+

kubectl port-forward pod/mypod 8080:8080      # local test without a Service

kubectl rollout status deployment/myapp

kubectl rollout restart deployment/myapp      # rolling restart without editing the spec

Kubernetes pod lifecycle: from Pending to Terminated

Why understand the lifecycle

Phases, the `.status.phase` field

Conditions, the detailed picture

Init containers

Probes: startup, readiness, liveness

startup probe

readiness probe

liveness probe

Probe types

Restart Policy

Termination, graceful shutdown

OOM in a pod

ImagePullBackOff and CrashLoopBackOff

When things go wrong

Useful kubectl commands

§ команды

§ см. также

Kubernetes pod lifecycle: from Pending to Terminated

Why understand the lifecycle

Phases, the `.status.phase` field

Conditions, the detailed picture

Init containers

Probes: startup, readiness, liveness

startup probe

readiness probe

liveness probe

Probe types

Restart Policy

Termination, graceful shutdown

OOM in a pod

ImagePullBackOff and CrashLoopBackOff

When things go wrong

Useful kubectl commands

§ команды

§ см. также

Why understand the lifecycle

Phases, the .status.phase field

Conditions, the detailed picture

Init containers

Probes: startup, readiness, liveness

startup probe

readiness probe

liveness probe

Probe types

Restart Policy

Termination, graceful shutdown

OOM in a pod

ImagePullBackOff and CrashLoopBackOff

When things go wrong

Useful kubectl commands

§ команды

§ см. также

Why understand the lifecycle

Phases, the .status.phase field

Conditions, the detailed picture

Init containers

Probes: startup, readiness, liveness

startup probe

readiness probe

liveness probe

Probe types

Restart Policy

Termination, graceful shutdown

OOM in a pod

ImagePullBackOff and CrashLoopBackOff

When things go wrong

Useful kubectl commands

§ команды

§ см. также

Phases, the `.status.phase` field

Phases, the `.status.phase` field