Fast u32 hashing filter generator

Download

prefixtree.c

Description

A small C program that generates very fast u32 hashing filters given a random set of prefixes and destination class ids as input. The resulting filters can be appended to a file containing the htb class definitions to create a complete tc input file.

The program generates a tree of u32 hash tables. The root table contains entries for the first byte in the IPv4 address. Each entry contains hash tables for the second byte, whose entries in turn contain hash tables for the third and so on. Only entries used in the prefixes supplied are generated.

This way, regardless of the IP address, classifying a packet into a class needs a fixed number of operations (the maximum tree depth is 4). At constant traffic levels, increasing the number of prefixes will only increase the used memory. CPU usage should remain fairly the same.

Compile/run instructions


root@br1 [/home/vcalinus]# gcc -o prefixtree prefixtree.c

root@br1 [/home/vcalinus]# ./prefixtree
IPv4 u32 hash filter generator - (C) 2006 Calin Velea
Syntax: prefixtree {prefix.in} {u32filters.out} {interface} {src/dst} [batch]

Arguments

  • {prefix.in} – input file. Contains the list of prefixes and the associated class ids in the following format:

<prefix> <classid>

….


  • {u32filters.out} – output file. Contains the generated u32 hash filters

  • {interface} – interface where the shaping takes place (tc filter add dev {interface} )

  • {src/dst} – indicates whether we are shaping upload or download (affects src/dst and hash offsets in tc statements)
  • [batch] – if specified, will generate an output file acceptable for tc -b

Typical usage example


tc.main.src – base tc file

qdisc add dev eth0 root handle 1: htb default 99 r2q 1
class add dev eth0 parent 1: classid 1:1 htb rate 400000kbit ceil 400000kbit quantum 1536

#default
class add dev eth0 parent 1:1 classid 1:99 htb rate 100kbit ceil 100kbit
qdisc add dev eth0 parent 1:99 handle 99: sfq quantum 1520 perturb 10
# cust1 – class 1:101
class add dev eth0 parent 1:1 classid 1:101 htb rate 10120kbit ceil 10120kbit quantum 1536
qdisc add dev eth0 parent 1:101 handle 101: sfq quantum 1520 perturb 10

# cust2 – class 1:102
class add dev eth0 parent 1:1 classid 1:102 htb rate 20096kbit ceil 20096kbit quantum 1536
qdisc add dev eth0 parent 1:102 handle 102: sfq quantum 1520 perturb 10

# cust3 – class 1:1af
class add dev eth0 parent 1:1 classid 1:1af htb rate 20096kbit ceil 20096kbit quantum 1536
qdisc add dev eth0 parent 1:1af handle 1af: sfq quantum 1520 perturb 10


prefixes.in – input file for prefixtree

68.91.0.0/20     1:101
195.28.184.0/29  1:102
89.165.145.5/32  1:1af


Run it:

root@br1 [/home/vcalinus]# ./prefixtree prefixes.in u32filters.out eth0 src batch
lines parsed: 3
total hashtables: 8

Output file:

root@br1 [/home/vcalinus]# cat u32filters.out

##### Generated with prefixtree v1.0 #####

filter add dev eth0 parent 1:0 prio 5 protocol ip u32
filter add dev eth0 parent 1:0 prio 5 handle 10: protocol ip u32 divisor 256
filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 800:: match ip src 0.0.0.0/0 hashkey mask 0xff000000 at 12 link 10:

## entries for 68.0.0.0/8

filter add dev eth0 parent 1:0 prio 5 handle 11: protocol ip u32 divisor 256
filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 10:44:  match ip src 0.0.0.0/0 hashkey mask 0xff0000 at 12 link 11:

## entries for 68.91.0.0/16

     filter add dev eth0 parent 1:0 prio 5 handle 12: protocol ip u32 divisor 256
     filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 11:5b:  match ip src 0.0.0.0/0 hashkey mask 0xff00 at 12 link 12:

         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 12:0:  match ip src 0.0.0.0/0 flowid 1:101
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 12:1:  match ip src 0.0.0.0/0 flowid 1:101
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 12:2:  match ip src 0.0.0.0/0 flowid 1:101
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 12:3:  match ip src 0.0.0.0/0 flowid 1:101
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 12:4:  match ip src 0.0.0.0/0 flowid 1:101
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 12:5:  match ip src 0.0.0.0/0 flowid 1:101
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 12:6:  match ip src 0.0.0.0/0 flowid 1:101
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 12:7:  match ip src 0.0.0.0/0 flowid 1:101
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 12:8:  match ip src 0.0.0.0/0 flowid 1:101
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 12:9:  match ip src 0.0.0.0/0 flowid 1:101
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 12:a:  match ip src 0.0.0.0/0 flowid 1:101
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 12:b:  match ip src 0.0.0.0/0 flowid 1:101
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 12:c:  match ip src 0.0.0.0/0 flowid 1:101
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 12:d:  match ip src 0.0.0.0/0 flowid 1:101
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 12:e:  match ip src 0.0.0.0/0 flowid 1:101
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 12:f:  match ip src 0.0.0.0/0 flowid 1:101

## entries for 89.0.0.0/8

filter add dev eth0 parent 1:0 prio 5 handle 16: protocol ip u32 divisor 256
filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 10:59:  match ip src 0.0.0.0/0 hashkey mask 0xff0000 at 12 link 16:

## entries for 89.165.0.0/16

     filter add dev eth0 parent 1:0 prio 5 handle 17: protocol ip u32 divisor 256
     filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 16:a5:  match ip src 0.0.0.0/0 hashkey mask 0xff00 at 12 link 17:

## entries for 89.165.145.0/24

         filter add dev eth0 parent 1:0 prio 5 handle 18: protocol ip u32 divisor 256
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 17:91:  match ip src 0.0.0.0/0 hashkey mask 0xff at 12 link 18:

               filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 18:5:  match ip src 0.0.0.0/0 flowid 1:1af

## entries for 195.0.0.0/8

filter add dev eth0 parent 1:0 prio 5 handle 13: protocol ip u32 divisor 256
filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 10:c3:  match ip src 0.0.0.0/0 hashkey mask 0xff0000 at 12 link 13:

## entries for 195.28.0.0/16

     filter add dev eth0 parent 1:0 prio 5 handle 14: protocol ip u32 divisor 256
     filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 13:1c:  match ip src 0.0.0.0/0 hashkey mask 0xff00 at 12 link 14:

## entries for 195.28.184.0/24

         filter add dev eth0 parent 1:0 prio 5 handle 15: protocol ip u32 divisor 256
         filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 14:b8:  match ip src 0.0.0.0/0 hashkey mask 0xff at 12 link 15:

               filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 15:0:  match ip src 0.0.0.0/0 flowid 1:102
               filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 15:1:  match ip src 0.0.0.0/0 flowid 1:102
               filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 15:2:  match ip src 0.0.0.0/0 flowid 1:102
               filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 15:3:  match ip src 0.0.0.0/0 flowid 1:102
               filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 15:4:  match ip src 0.0.0.0/0 flowid 1:102
               filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 15:5:  match ip src 0.0.0.0/0 flowid 1:102
               filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 15:6:  match ip src 0.0.0.0/0 flowid 1:102
               filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 15:7:  match ip src 0.0.0.0/0 flowid 1:102

root@br1 [/home/vcalinus]#

Finishing up

Put classes and filters together:

cat u32filters.out >> tc.main.src

Apply everything:

tc –b < tc.main.src

Repeat the steps above using dst when running prefixtree to shape the download.


Optimizing existing tc files

Suppose you have a tc file which is not using hashing. Classification is done only based on src/dst prefix match (no port or other specific u32 matches are used). You could easily optimize it by separating the classes and filters part, constructing an input file for prefixtree then joining it back with the classes part. You need to write a script to parse the files and extract the prefixes and class ids (from the tc statements).

grep –v "tc filter add" unoptimized_tc.in > classes.out

grep "tc filter add" unoptimized_tc.in > unoptimized_filters.out

./generate_prefixtree_input classes.out unoptimized_filters.out prefixes.in

./prefixtree prefixes.in optimized_filters.out $iface src batch

cat optimized_filters_out >> classes.out

mv classes.out optimized_tc.in

tc –b < optimized_tc.in

Test Results

Real-world testing has shown troughputs of 1300Mbps / 250.000 pps (aggregated in+out)  for a 2.6.20 linux shaping bridge on a quad-core Xeon X3210 (2.13GHz, 8M  L2 cache), 2GBs of RAM using Intel PCI Express Gigabit NICs.  At this traffic level, CPU utilization averages varied between 25 – 50 % for every core.  Almost 8.000 prefixes of various lengths were being shaped, with a roughly equal number of htb classes.

5000 series Xeons with 12MB of L2 cache, running at 3GHz or more should come close to 6-700.000 pps peak performance, saturating several gigabit interfaces.

Traffic classification using BGP (a quagga+realms approach)

Realms patch – Quagga 0.98.6

Stable: quagga-0.98.6-realms.diff
Development: quagga-0.99.5-realms.diff
Updated versions (>0.99.5) http://linux.mantech.ro/quagga+realm_en.html

This patch enables Linux route realms support in quagga 0.98.6
I started with Arcady Stepanov’s patch for zebra 0.93b http://win.mol.ru/penguin/zebra-hacks/, adapted it to quagga 0.98.4 interface and added some useful features.
The following commands are supported:

  • Route-map
    • bgpd(config-route-map)# set realm
        <1-255>    Realm id for Linux FIB routes
        WORD       Realm name for Linux FIB routes
        origin-as  Use route origin AS as realm id
        peer-as    Use route peer AS as realm id

    • bgpd(config-route-map)# no set realm
        <0-255>    Realm value
        WORD       Realm name
        origin-as  Origin AS - realm
        peer-as    Peer AS - realm
        <cr>

  • Neighbor
    • bgpd(config-router)# neighbor x.x.x.x realm
        <0-255>    default realm id
        WORD       default realm name
        origin-as  Set default realm to received route origin AS
        peer-as    Set default realm to peer AS

    • bgpd(config-router)# no neighbor x.x.x.x realm
        <0-255>    default realm id
        WORD       default realm name
        origin-as  Set default realm to received route origin AS
        peer-as    Set default realm to peer AS
        <cr>

Note:

’set realm origin-as’ was added with inter-AS traffic accounting in mind. For now, this is possible only with the iptables realm match which can match on the full 16bit realm value. The current realm accounting code in the kernel (rtacct – /proc/net/rt_acct) supports only 256 values for realms, and displays incorrect statistics.

Bugs/suggestions should go to: vcalinusATgemenii.ro

Brief usage guide…

0. kernel support (if you want to classify traffic into htb classes using tc)

CONFIG_NET_CLS_ROUTE=y

1. /eetc/iproute2/rt_realms

Assign meaningful names to realm numbers...

user@router:/# cat /eetc/iproute2/rt_realms

10 localnet
20 metro-isp
22 metro-other
30 international

2. compile/install quagga

Stable Quagga 0.98.6
quagga 0.98.6 - official release
+
quagga 0.98.6 realms patch
Big thanks to Alin Nastac for updating the patch to 0.98.6!

Patch for development Quagga 0.99.5
quagga-0.99.5-realms.diff
Older patches
quagga-0.98.5-realms.diff quagga-0.98.4-realms.diff quagga-0.98.3-realms.diff Remember to use ./configure --enable-realms 3. BGP CONFIGURATION a possible bgp setup: (if you hold the full routing table - replace defgw with a match on the desired community) AS-regexp match is also possible neighbor xxx.xxx.xxx.xxx remote-as XXXXX neighbor xxx.xxx.xxx.xxx soft-reconfiguration inbound neighbor xxx.xxx.xxx.xxx route-map isp_in in ip prefix-list defgw seq 5 permit 0.0.0.0/0 ip community-list standard metro-isp permit XXXXX:comm1 ip community-list standard metro-other permit XXXXX:comm2 route-map isp_in permit 10 match ip address prefix-list defgw set realm 30 ! route-map isp_in permit 20 match community metro-isp set realm 20 ! route-map isp_in permit 30 match community metro-other set realm 22 ! route-map isp_in permit 40 3.1 'ip route sh' will show kernel routes - they should have the realms specified in the route-map something like.... 62.217.192.0/18 via 193.19.192.65 dev eth1 proto zebra equalize realm 20 82.137.0.0/18 via 172.16.100.1 dev eth2 proto zebra equalize realm 22 84.243.64.0/18 via 172.16.100.1 dev eth2 proto zebra equalize realm 20 82.208.128.0/18 via 193.19.192.65 dev eth1 proto zebra equalize realm 22 4. iptables Can be used in FORWARD or POSTROUTING (remember that realms are valid only after the forwarding decision) Download: match default route, community 1, and community 2 sets -A FORWARD -i eth3 -m realm --realm 0x1e0000/0xffff0000 -j sometarget... -A FORWARD -i eth3 -m realm --realm 0x140000/0xffff0000 -j sometarget... -A FORWARD -i eth3 -m realm --realm 0x160000/0xffff0000 -j sometarget... Upload: match default route, community 1, and community 2 sets -A FORWARD -o eth3 -m realm --realm 0x1e/0xffff -j sometarget... -A FORWARD -o eth3 -m realm --realm 0x14/0xffff -j sometarget... -A FORWARD -o eth3 -m realm --realm 0x16/0xffff -j sometarget... (realms 30,20 and 22 are specified in hexadecimal) 5. tc Excerpt from LARTC # ip route add 192.168.2.0/24 dev eth2 realm 2 # tc filter add dev eth1 parent 1:0 protocol ip prio 100 route from 2 classid 1:2 Here the filter specifies that packets from the subnetwork 192.168.2.0 (realm 2) will match class id 1:2. You can also find useful QoS stuff at: http://kernel.umbrella.ro/net/ 6. what are realms after all? Realms are 16bit integer values used to group routes into sets, according to some defined policy. Each route in the set will have the same realm. Each packet routed will have a 32bit integer value specifying a source and a destination realm. (they may be 0 - or unknown) On the leftmost 16bits you will find the source realm, on the rightmost 16bits the destination realm. More info: http://www.policyrouting.org/iproute2.doc.html#ss9.9