Why keepalive
TCP without traffic has no way to know whether the peer is alive. If a client silently unplugs or NAT closes the mapping, the server finds out only when it tries to send data. On long-lived connections (DB pools, WebSockets, queues) this is a problem: a thread sits blocked on a zombie connection.
Keepalive is a kernel mechanism: after N seconds of silence the kernel sends a probe ACK with a stale sequence number. If the peer is alive it returns an ACK; if dead, nothing comes back, and after M probes the kernel closes the socket with ETIMEDOUT.
Three host-level tuning knobs
| sysctl | default | meaning |
|---|---|---|
net.ipv4.tcp_keepalive_time | 7200 (2 h) | seconds of silence before the first probe |
net.ipv4.tcp_keepalive_intvl | 75 | interval between probes |
net.ipv4.tcp_keepalive_probes | 9 | number of probes before declaring the peer dead |
Default = 2 hours idle + 9x75s = roughly 2 hours 11 minutes before close. For most workloads that is far too long.
Enabling keepalive in an application
The socket option SO_KEEPALIVE is off by default. You must set it explicitly:
import socket
s = socket.socket()
s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
# Per-socket override (Linux):
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60) # 60s idle before first probe
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 10) # 10s between probes
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 3) # 3 probes
Result: 60 + 3x10 = 90 seconds from silence to close.
When to tune
- DB pools (PostgreSQL/MySQL): the default 2-hour window means that after a DB restart the pool holds dead sockets until the first SQL query. Set keepalive to 30-60 seconds.
- WebSocket / gRPC with a proxy in between: NAT/LB devices typically close idle connections after 5-10 minutes. Keepalive every 30-60 seconds prevents that.
- VPN / SSH tunnels: same issue.
- API behind cloud LB: AWS NLB closes idle connections after 350s by default.
Keepalive vs application ping
| Approach | Pros | Cons |
|---|---|---|
| TCP keepalive | free, handled by the kernel | does not verify that the application is alive, only the socket; does not work through L7-proxying proxies |
| Application ping | checks the full chain to the handler | requires implementation, adds traffic |
For [[websocket|WebSocket]] the right answer is both: keepalive catches a broken TCP connection, and an application-level ping (frame opcode 0x9) catches a hung server.
What you see in tcpdump
A keepalive probe is a packet with seq = current_seq - 1, no payload, ACK flag.
In tcpdump look for an empty ACK arriving at the tcp_keepalive_intvl interval
after the previous traffic.
IP 10.0.0.1.443 > 10.0.0.5.34521: Flags [.], ack 100, win 1024, length 0
Notes
- Keepalive keeps the NAT mapping alive precisely because it sends packets.
Set
tcp_keepalive_timelower than the NAT timeout and NAT will not close the mapping. - On modern Linux there is
TCP_USER_TIMEOUTas an alternative: close the connection if no ACK is received for sent data within N milliseconds. It is often more useful than keepalive because it works under load too.
Troubleshooting
- Connection hangs after 5 minutes idle without closing: keepalive is off, or tcp_keepalive_time exceeds the NAT timeout.
- Probes are too frequent and noisy: tcp_keepalive_intvl is too small, or TCP_KEEPIDLE is set to 5 seconds (excessive).
- Connection closed after 5 minutes despite active traffic: not keepalive; check the NAT/LB config (idle timeout is separate from keepalive).
error: ETIMEDOUTon send(): keepalive fired and the peer is dead.