TCP keepalive: TCP keepalive | LinuxLab

Why keepalive

TCP without traffic has no way to know whether the peer is alive. If a client silently unplugs or NAT closes the mapping, the server finds out only when it tries to send data. On long-lived connections (DB pools, WebSockets, queues) this is a problem: a thread sits blocked on a zombie connection.

Keepalive is a kernel mechanism: after N seconds of silence the kernel sends a probe ACK with a stale sequence number. If the peer is alive it returns an ACK; if dead, nothing comes back, and after M probes the kernel closes the socket with ETIMEDOUT.

Three host-level tuning knobs

sysctl	default	meaning
`net.ipv4.tcp_keepalive_time`	7200 (2 h)	seconds of silence before the first probe
`net.ipv4.tcp_keepalive_intvl`	75	interval between probes
`net.ipv4.tcp_keepalive_probes`	9	number of probes before declaring the peer dead

Default = 2 hours idle + 9x75s = roughly 2 hours 11 minutes before close. For most workloads that is far too long.

Enabling keepalive in an application

The socket option SO_KEEPALIVE is off by default. You must set it explicitly:

python

import socket

s = socket.socket()

s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)

# Per-socket override (Linux):

s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60)   # 60s idle before first probe

s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 10)  # 10s between probes

s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 3)     # 3 probes

Result: 60 + 3x10 = 90 seconds from silence to close.

When to tune

DB pools (PostgreSQL/MySQL): the default 2-hour window means that after a DB restart the pool holds dead sockets until the first SQL query. Set keepalive to 30-60 seconds.
WebSocket / gRPC with a proxy in between: NAT/LB devices typically close idle connections after 5-10 minutes. Keepalive every 30-60 seconds prevents that.
VPN / SSH tunnels: same issue.
API behind cloud LB: AWS NLB closes idle connections after 350s by default.

Keepalive vs application ping

Approach	Pros	Cons
TCP keepalive	free, handled by the kernel	does not verify that the application is alive, only the socket; does not work through L7-proxying proxies
Application ping	checks the full chain to the handler	requires implementation, adds traffic

For [[websocket|WebSocket]] the right answer is both: keepalive catches a broken TCP connection, and an application-level ping (frame opcode 0x9) catches a hung server.

What you see in tcpdump

A keepalive probe is a packet with seq = current_seq - 1, no payload, ACK flag. In tcpdump look for an empty ACK arriving at the tcp_keepalive_intvl interval after the previous traffic.

IP 10.0.0.1.443 > 10.0.0.5.34521: Flags [.], ack 100, win 1024, length 0

Notes

Keepalive keeps the NAT mapping alive precisely because it sends packets. Set tcp_keepalive_time lower than the NAT timeout and NAT will not close the mapping.
On modern Linux there is TCP_USER_TIMEOUT as an alternative: close the connection if no ACK is received for sent data within N milliseconds. It is often more useful than keepalive because it works under load too.

Troubleshooting

Connection hangs after 5 minutes idle without closing: keepalive is off, or tcp_keepalive_time exceeds the NAT timeout.
Probes are too frequent and noisy: tcp_keepalive_intvl is too small, or TCP_KEEPIDLE is set to 5 seconds (excessive).
Connection closed after 5 minutes despite active traffic: not keepalive; check the NAT/LB config (idle timeout is separate from keepalive).
error: ETIMEDOUT on send(): keepalive fired and the peer is dead.

Why keepalive

Three host-level tuning knobs

sysctl	default	meaning
`net.ipv4.tcp_keepalive_time`	7200 (2 h)	seconds of silence before the first probe
`net.ipv4.tcp_keepalive_intvl`	75	interval between probes
`net.ipv4.tcp_keepalive_probes`	9	number of probes before declaring the peer dead

Default = 2 hours idle + 9x75s = roughly 2 hours 11 minutes before close. For most workloads that is far too long.

Enabling keepalive in an application

The socket option SO_KEEPALIVE is off by default. You must set it explicitly:

python

import socket

s = socket.socket()

s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)

# Per-socket override (Linux):

s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60)   # 60s idle before first probe

s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 10)  # 10s between probes

s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 3)     # 3 probes

Result: 60 + 3x10 = 90 seconds from silence to close.

When to tune

DB pools (PostgreSQL/MySQL): the default 2-hour window means that after a DB restart the pool holds dead sockets until the first SQL query. Set keepalive to 30-60 seconds.
WebSocket / gRPC with a proxy in between: NAT/LB devices typically close idle connections after 5-10 minutes. Keepalive every 30-60 seconds prevents that.
VPN / SSH tunnels: same issue.
API behind cloud LB: AWS NLB closes idle connections after 350s by default.

Keepalive vs application ping

Approach	Pros	Cons
TCP keepalive	free, handled by the kernel	does not verify that the application is alive, only the socket; does not work through L7-proxying proxies
Application ping	checks the full chain to the handler	requires implementation, adds traffic

For [[websocket|WebSocket]] the right answer is both: keepalive catches a broken TCP connection, and an application-level ping (frame opcode 0x9) catches a hung server.

What you see in tcpdump

A keepalive probe is a packet with seq = current_seq - 1, no payload, ACK flag. In tcpdump look for an empty ACK arriving at the tcp_keepalive_intvl interval after the previous traffic.

IP 10.0.0.1.443 > 10.0.0.5.34521: Flags [.], ack 100, win 1024, length 0

Notes

Keepalive keeps the NAT mapping alive precisely because it sends packets. Set tcp_keepalive_time lower than the NAT timeout and NAT will not close the mapping.
On modern Linux there is TCP_USER_TIMEOUT as an alternative: close the connection if no ACK is received for sent data within N milliseconds. It is often more useful than keepalive because it works under load too.

Troubleshooting

Connection hangs after 5 minutes idle without closing: keepalive is off, or tcp_keepalive_time exceeds the NAT timeout.
Probes are too frequent and noisy: tcp_keepalive_intvl is too small, or TCP_KEEPIDLE is set to 5 seconds (excessive).
Connection closed after 5 minutes despite active traffic: not keepalive; check the NAT/LB config (idle timeout is separate from keepalive).
error: ETIMEDOUT on send(): keepalive fired and the peer is dead.

TCP keepalive

Why keepalive

Three host-level tuning knobs

Enabling keepalive in an application

When to tune

Keepalive vs application ping

What you see in tcpdump

Notes

Troubleshooting

§ команды

§ см. также

TCP keepalive

Why keepalive

Three host-level tuning knobs

Enabling keepalive in an application

When to tune

Keepalive vs application ping

What you see in tcpdump

Notes

Troubleshooting

§ команды

§ см. также