linuxlab.io
Tutorials▾
  • Linux & networking
    File system, processes, TCP/IP, BGP and OSPF
    →
  • Terraform & IaC
    HCL, state, plan/apply on a LocalStack sandbox
    →
  • Git & GitHub
    Object model, plumbing, branching, GitHub Actions
    →
All tutorials →
PricingAboutSign inCreate account
/
  • Introduction
  • Lessons
  • How it works
  • Simulator
  • Knowledge base
  • Interview prep
Index
Categories
All entries
Footer
linuxlab-TutorialsPricingAboutPrivacy & cookies
Copyright © 2026 LinuxLab. All rights reserved.
home/linux/kb/Networking: L2 / L3/policy-routing

kb/network-l2-l3 ── Networking: L2 / L3 ── advanced

Policy Routing: Rule-Based Routing

Policy routing selects a routing table based on src-IP, fwmark, iif, or tos. ip rule + ip route table N. Multi-uplink, source-based routing, VRF, split-tunnel VPN. RPDB is the Routing Policy Database.

view as markdownaka: policy-routing, ip-rule, rule-based-routing, source-routing, policy-based-routing

Why policy routing

The standard [[routing-table|routing table]] selects the next hop by destination IP using longest-prefix-match. That is not enough when you have:

  • Two uplink providers -- traffic from source A goes to Telecom, source B goes to Cogent
  • Source-based routing -- "traffic from 10.0.1.0/24 exits via provider X"
  • Per-tenant routing in a multitenant system ([[cni-plugins|CNI]], OpenStack)
  • NAT split -- a packet with fwmark=0x100 goes through a NAT machine, others go directly
  • Transit traffic through VPN for one namespace, local path for everything else
  • VRF-like isolation on a router

The solution is multiple routing tables plus selection rules. That is policy routing.

RPDB: Routing Policy Database

Linux maintains a set of rules (ip rule list) that are evaluated in priority order. Each rule says "if match, use this routing table." The first match wins.

Default state:

$ ip rule list
0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup default

Three predefined tables:

  • local (255) -- host addresses and broadcasts. Populated by the kernel automatically
  • main (254) -- everything you add with the normal ip route add
  • default (253) -- rarely used, intended as a fallback

You can add your own tables. Table names and numbers live in /etc/iproute2/rt_tables:

# echo "100 isp_a" >> /etc/iproute2/rt_tables
# echo "200 isp_b" >> /etc/iproute2/rt_tables

Names are a convenience layer; the kernel works with numbers 0-255 (with lwt extensions, higher values are possible).

Multiple tables: two-uplink example

Two external links: eth0 (10.0.1.1 to ISP A) and eth1 (10.0.2.1 to ISP B). Goal: traffic from 192.168.10.0/24 exits via A, traffic from 192.168.20.0/24 exits via B.

Step 1: populate the tables:

ip route add default via 10.0.1.254 dev eth0 table isp_a
ip route add 10.0.1.0/24 dev eth0 src 10.0.1.1 table isp_a
ip route add default via 10.0.2.254 dev eth1 table isp_b
ip route add 10.0.2.0/24 dev eth1 src 10.0.2.1 table isp_b

Step 2: add rules:

ip rule add from 192.168.10.0/24 table isp_a priority 1000
ip rule add from 192.168.20.0/24 table isp_b priority 1001

Now, when forwarding a packet with src 192.168.10.5, the kernel hits the rule at priority 1000 and uses isp_a: default via 10.0.1.254 on eth0.

Rule selectors

ip rule add accepts many selectors:

SelectorWhat it matches
from <prefix>source IP falls inside the prefix
to <prefix>destination IP falls inside the prefix
iif <name>packet arrived on this interface
oif <name>packet exits through this interface
tos <value>DSCP/TOS byte
fwmark <mark>netfilter mark on the packet
uidrange <a-b>UID of the process (for locally generated traffic)
l3mdevL3 master device (for VRF)
ipproto <proto>IP protocol (TCP/UDP/...)

Actions:

  • lookup <table> -- use routing table N
  • goto <priority> -- jump to another rule
  • nop -- do nothing, continue to next rule
  • blackhole, unreachable, prohibit -- drop the packet

fwmark and policy routing: a common pattern

Suppose you want HTTP traffic to go through a VPN and everything else to go directly. Combine iptables/[[nat|netfilter]] marking with policy routing:

# mark HTTP traffic
iptables -t mangle -A OUTPUT -p tcp --dport 80 -j MARK --set-mark 0x100
iptables -t mangle -A OUTPUT -p tcp --dport 443 -j MARK --set-mark 0x100
# routing for marked traffic
ip route add default dev wg0 table 100
ip rule add fwmark 0x100 table 100 priority 500
# marked HTTP/HTTPS now goes into wg0 (WireGuard tunnel)

This pattern appears in:

  • Split-tunnel VPN (only specific traffic goes through the tunnel)
  • Transparent proxy (mark a packet, route it to a local proxy)
  • DDoS scrubbing (mark suspicious traffic, send it down a separate path)
  • Container networking ([[cni-plugins|CNI]]: mark per namespace, separate table per namespace)

Reverse path filter and policy routing

Policy routing often produces asymmetric routing: a packet arrives on eth0 but the reply leaves on eth1. The default rp_filter=1 (strict mode) compares the source address against the reverse route. When they do not match, the packet is dropped.

The fix is loose mode:

sysctl -w net.ipv4.conf.all.rp_filter=2
sysctl -w net.ipv4.conf.eth0.rp_filter=2
sysctl -w net.ipv4.conf.eth1.rp_filter=2

In loose mode the kernel accepts any source address that is reachable through any interface. Without this, asymmetric routing simply does not work.

VRF: Virtual Routing and Forwarding

Linux 4.3 and later provide VRF-lite: virtual routing instances. Each VRF has its own set of interfaces and its own routing table, isolated from the others.

# create VRF "tenant-a" with table 100
ip link add vrf-a type vrf table 100
ip link set vrf-a up
# assign an interface to the VRF
ip link set eth1 master vrf-a
# routes are added into table 100
ip route add default via 10.0.1.1 dev eth1 table 100

A process can be bound to a VRF with ip vrf exec tenant-a curl .... VRF is used in:

  • Multitenant routing on a single Linux router
  • Management plane separation (management traffic through a mgmt-vrf)
  • Cumulus Linux, SONiC, and FRR-based routers

Source address for outgoing traffic

When a host has multiple IP addresses, the kernel selects the source address based on the routing table. To force a specific source:

ip route add 8.8.8.8 via 10.0.1.254 src 10.0.1.99

Or specify it explicitly at the application level:

curl --interface 10.0.1.99 https://example.com
ping -I 10.0.1.99 8.8.8.8

This matters for multi-IP hosts where different services bind to different addresses (mail on one address, web on another).

Packet processing order (simplified)

For an outgoing packet:

  1. Process socket to OUTPUT (mangle, nat, filter chains)
  2. Routing decision: ip rule list to table to ip route show table N
  3. POSTROUTING (mangle, nat)
  4. Transmission on the link

For a forwarded packet:

  1. PREROUTING (mangle, nat, conntrack)
  2. Routing decision (forwarding)
  3. FORWARD chain
  4. POSTROUTING

Setting fwmark in PREROUTING or OUTPUT works with policy routing (a rule matching fwmark) because the mark is set before the routing decision.

Troubleshooting

  • Rule added but not working -- check the priority. Lower number means higher priority. If from all lookup main (32766) matches before your rule, move your rule to a priority below 32766.
  • Route added to a table but traffic bypasses it -- confirm your rule points to that table: ip rule list | grep <table>.
  • Asymmetric routing causes drops -- rp_filter=1 is the culprit. Set it to 2 (loose mode).
  • ip route get <dst> shows unexpected results -- use ip route get <dst> from <src> mark <mark> to simulate your specific case through RPDB.
  • Rules disappear after reboot -- they are not saved automatically. Persist them via a NetworkManager dispatcher script, systemd-networkd, or a custom script in /etc/network/if-up.d/.
  • VRF and services -- a process does not see the VRF until it is started with ip vrf exec <name> or uses the SO_BINDTODEVICE socket option.

Useful commands

  • ip route flush cache -- flush the routing cache (a legacy practice; the cache was removed in Linux 3.6, but the command still exists)
  • ip rule list table <name> -- show which rules point to a given table
  • ip route show table all -- show all routes in all tables
  • suppress_prefixlength -- a rule that matches only when the routing decision produced a prefix shorter than N; used in L3 VPNs to prevent a default route from overriding more-specific routes

§ команды

bash
ip rule list

List all RPDB rules in priority order, showing which table each rule looks up

bash
ip route show table 100

Show routes in a specific table, identified by number or name from rt_tables

bash
ip rule add from 192.168.10.0/24 table isp_a priority 1000

Source-based rule: packets from this subnet use table isp_a

bash
ip rule add fwmark 0x100 table 100

Packets carrying fwmark 0x100 (set by iptables mangle) are routed via table 100

bash
ip route add default via 10.0.2.254 dev eth1 table isp_b

Add a default route inside the custom table isp_b

bash
ip route get 8.8.8.8 from 10.0.1.99

Simulate the routing decision for a specific (src, dst) pair

bash
sysctl -w net.ipv4.conf.all.rp_filter=2

Enable loose RPF mode, required for policy routing with asymmetric traffic paths

bash
ip vrf exec mgmt-vrf curl https://api.internal

Run a process in the context of a VRF so its traffic uses the VRF routing table

§ см. также

  • routing-tableRouting tableThe routing table lists where to send packets for each destination. The longest matching prefix wins.
  • default-gatewayDefault gateway: leaving your own networkThe router IP in your subnet where the stack sends packets for every address that **is not local**. One gateway per host, but in multi-homed setups there can be several.
  • cmd-ipip: Swiss army knife for network configuration`ip` is the iproute2 frontend that replaces the old ifconfig, route, and arp tools. Subcommands: `ip addr` (addresses), `ip link` (interfaces), `ip route` (routing table), `ip neigh` (ARP).
  • natNAT: Network Address TranslationNAT rewrites the src or dst address of a packet at a router. Masquerade is the common case: the src IP is replaced with the router's outbound address so hosts on a private network can reach the public internet.
  • ip-forwardingIP Forwarding: Turn a Host into a RouterLinux does not forward packets between interfaces by default. Enable it with `sysctl net.ipv4.ip_forward=1`. Without this, NAT, VPN routing, and any forwarding will not work.
Footer
linuxlab-
Copyright © 2026 LinuxLab. All rights reserved.
Tutorials
Pricing
About
Privacy & cookies