Writing a Module For netfilter

With Linux 2.4 right around the corner, now would be a very good time to discuss the new packet observation and filtering mechanism that were introduced during the 2.3 kernel development, which iscalled netfilter. I discussed the netfilter architecture briefly back in my Best Defense column in October 1999 (http://www.linux-mag.com/1999-10/bestdefense_01.html), and more thoroughly in the January 2000 issue of Linux Magazine.

Gearheads Figure1(2)
Figure One: The netfilter architecture allows you to hook into a protocol stack at several points.

With Linux 2.4 right around the corner, now would be a very good time to discuss the new packet observation and filtering mechanism that were introduced during the 2.3 kernel development, which iscalled netfilter. I discussed the netfilter architecture briefly back in my Best Defense column in October 1999 (http://www.linux-mag.com/1999-10/bestdefense_01.html), and more thoroughly in the January 2000 issue of Linux Magazine.

netfilter is a framework inside the kernel that allows a module to observe and modify packets as they pass through the IP stack. Well, since I wrote that article in January, netfilter hooks have been added to the IPv6 (the next-generation of IP) and DECnet (a more obscure protocol) layers that are similar to those described here for IPv4.

Inside the kernel you will see calls such as the following throughout the protocol code (this is from ip_local_ deliver() in net/ipv4/ip_input.c):

skb, skb->dev, NULL,

NF_HOOK is a macro that calls any registered netfilter hooks for the given protocol (PF_INET) and hook (NF_IP_ LOCAL_IN), with the given packet (skb). It also handles information on the incoming and outgoing devices (skb-> dev and NULL, respectively). Once everyone registered to listen on that hook has returned NF_ACCEPT, the function specified by the last argument is called to continue packet traversal (ip_local_deliver_finish). If a hook returns NF_DROP, the packet is freed, and the function is never called.

If CONFIG_NETFILTER is set to n when the kernel is compiled, then the above macro simply calls the final argument, which is declared inline (a gcc extension taken from C++) so there is no overhead for that case.

Where to put these NF_HOOK calls in your protocol stack is of fairly limited interest (there are only about a dozen protocols in the Linux kernel), but of more interest is the other side of the framework: How do you register to listen for packets at a certain point? Many people have specialized packet watching or mangling needs, so I’ll explain what they can expect.

First, you have to decide what protocol you wish to hook into. netfilter divides up hooks on a per-protocol basis: there is no way to hook into all packets at once, for example. Usually this will be IP (protocol PF_INET inside the kernel).

Each protocol defines a number of points you can hook into. IPv4 defines five points, and the other protocols have so far followed the model shown in Figure One (although DECnet added some new ones).

As you can see in the figure, a hook can observe all valid incoming packets by registering at NF_IP_PRE_ROUTING. If you only want to observe packets destined for this IP address, you can do that by hooking into NF_IP_LOCAL_IN, and locally generated packets at NF_IP_LOCAL_OUT. Packets being forwarded through the machine will hit the NF_IP_FORWARD hook, and immediately before IP packets are transmitted they will pass through the NF_IP_POST_ ROUTING hook.

Since many hooks can be registered at the same point, some priority must be assigned to each hook to determine what order they are executed in. Hooks with a lower-priority number are called first.

For IPv4, linux/netfilter_ipv4.h has an enumerated type that offers some standard values. Traditionally, 0 is for packet filtering, so negative numbers are used for executing hooks before filtering, and positive numbers for after filtering.

To register a hook, you fill in an nf_hook_ops structure with the priority, hook point, and a pointer to your hook function, and call nf_register_hook(). In keeping with kernel tradition, this function returns 0 for success, and a negative error number for failure. A good example to look at is Jamal Salim’s ingress filtering in net/sched/sch_ingress.c, which uses a single netfilter hook, or the more complex examples in the net/ipv4/netfilter/ directory.

A Silly Example

For the purposes of this article we’re going to work a little bit on the demonstration-only linuxmag.o kernel module. This tiny module will corrupt locally generated IP packets that are of length 100, and drop packets that are of length 200. First, we define the nf_hook_ops structure:

static struct nf_hook_ops linuxmag_ops
= { { NULL, NULL }, linuxmag_hook,

The first element in the structure ({ NULL, NULL},) is a doubly-linked-list element, which is used internally. The second is the function to call (which in this case is the linuxmag_hook function). Following that is the protocol (PF_INET), the hook point (NF_IP_LOCAL_OUT) for locally generated packets, and the priority (just before packet filtering).

All we need to do now is write the function that does the actual work (see Listing One ).

Listing One: The linuxmag_hook Function

static unsigned int
linuxmag_hook(unsigned int hook, struct sk_buff **pskb,
const struct net_device *indev, const
struct net_device *outdev, int
(*okfn)(struct sk_buff *))

/* Get a handle to the packet data */
unsigned char *data = (void *)(*pskb)->nh.iph +

(*pskb)->nfcache |= NFC_UNKNOWN;

switch ((*pskb)->len) {
case 100:
printk(“linuxmag: corrupting packet\n”);
(*pskb)->nfcache |= NFC_ALTERED;
return NF_ACCEPT;

case 200:
printk(“linuxmag: dropping packet\n”);
return NF_DROP;

return NF_ACCEPT;

We can see that the hook function takes five arguments:

1. The Hook. This will always be NF_IP_ LOCAL_OUT in this module, as that is the only place we register this function.

2. A Pointer to a Pointer to the skbuff. This represents the packet. We will use the double-pointer so that we can replace the entire packet with another one if that becomes necessary.

3. A Pointer to the Input Device. This is set to NULL for the NF_IP_LOCAL_OUT hook.

4.A Pointer to the Output Device. This is set to the interface the packet is heading out for the NF_IP_ LOCAL_OUT hook.

5.A Pointer to the Function that Will be Called if All the Hooks are Successful. This should never be called directly, except for special effects (it is a hack for modules that need to fragment packets).

In this function, we only care about the packet itself, so we use only the pskb parameter. The first thing we do is obtain a pointer to the packet’s IP header. We know this field (nh.iph) is valid, because we registered this as a PF_INET hook, so we will only ever be passed IP packets.

The second thing we do is a little tricky. Each skbuff has a field that should identify which skbuff fields were examined by a hook. Values for this are given in include/linux/netfilter/netfilter_ipv4.h. For example, if a module examined the source IP address, we would set the NFC_IP_SRC bit in the nfcache field. In the future this field could be used to cache the decisions made by modules. There is no field for packet length, so we set the NFC_UNKNOWN bit, which means “I looked at something that the framework doesn’t understand, so make sure I get every packet.”

Next, we decide what to do based on packet length. If the packet length is 100, we increment the last byte. Because we altered the packet, we must mark it altered, by setting the NFC_ALTERED bit. This is particularly important for NF_IP_LOCAL_OUT hook, which needs to look up the route on the packet again in case we were to change the way routing should be done. We then return NF_ACCEPT, which means to let the packet through.

If the length is 200, we simply return NF_DROP, which means the packet should be dropped. Otherwise, the packet passes unscathed, by returning NF_ACCEPT.

Polishing Our Example

We need very little else to turn these two code fragments into a complete kernel module. At the top of the code, we need the headers and a comment:

/* Example kernel module for Linux Magazine. */
#include <linux/config.h>
#include <linux/module.h>
#include <linux/netfilter_ipv4.h>
#include <linux/ip.h>

Following this comes the linuxmag_hook function, then the linuxmag_ops structure, then finally the glue needed to turn it into a module:

static int __init init(void)
return nf_register_hook(&linuxmag_ops);

static void __exit fini(void)


So now we have a complete kernel module: the init function loads and registers our hook function (returning a negative error code if it fails) and the fini function unregisters it. Then we only need to use the module_init and module_ exit macros to tell the kernel that these are our module initialization functions. The _init and _exit keywords are used if this is built into the kernel: It means that the init function will be discarded after boot, freeing memory, and that the fini function will never be needed at all, and hence should not be included in the kernel image.

Testing Our Example

Let’s look at what happens when we install our module and test it using the ping program:

# insmod ./linuxmag.o
# ping -c1 linuxcare.com.au
PING linuxcare.com.au ( 56
data bytes
64 bytes from icmp_seq=0
ttl=249 time=204.0 ms

— linuxcare.com.au ping statistics —
1 packets transmitted, 1 packets received,
0% packet loss

Now let’s send a packet of length 200 (which means we must use the ping option -s172, since there are 20 bytes for the IP header, and 8 for the ICMP header):

# ping -c1 -s172 linuxcare.com.au
PING linuxcare.com.au ( 172
data bytes
ping: sendto: Operation not permitted
ping: wrote linuxcare.com.au 180 chars, ret=-1

— linuxcare.com.au ping statistics —
1 packets transmitted, 0 packets received,
100% packet loss

And from dmesg we can see:

# dmesg -c
linuxmag: dropping packet

A packet of length 100 is corrupted (the ICMP checksum will be incorrect after we’ve modified it), and so we will receive no reply:

# ping -c1 -s72 linuxcare.com.au
PING linuxcare.com.au (
72 data bytes

— linuxcare.com.au ping statistics —
1 packets transmitted, 0 packets received,
100% packet loss

And once again dmesg shows our little message:

# dmesg -c
linuxmag: corrupting packet

If you were to do a tcpdump on a remote machine, you would see the modified packet on the wire.

Beyond Our Example

Hook functions can return things other than NF_ ACCEPT and NF_DROP. You can return NF_STOLEN, which means “I’ve taken control of the packet, so don’t refer to it again.” This is different from NF_DROP, which tries to free the packet using kfree_skb(). You can also return NF_
, which is like NF_ACCEPT, but calls this hook function again, rather than moving on to the next one.

Finally, you can also return NF_QUEUE, which allows the packet to be queued for asynchronous packet handling. If a handler is registered (for IP, this is in net/ipv4/netfilter/ip_queue.c) then it will be handed the packet, and then processing will finish. At some later time, the packet will be reinjected, and processing will continue.

This is a very useful technique for dealing with packets in userspace, where the kernel cannot wait while the processing is going on. In fact, if ultra-high speed is not a requirement, you can do everything you would do in the kernel in a simple userspace program, using James Morris’ libipq.

Where to Find Out More

As well as building on top of the netfilter framework directly, there are elements which already exist which provide higher-level functionality for IP (especially for packet filtering). You can find details on all these in the netfilter-hacking-HOWTO, which is available in my Unreliable Guides collection at http://netfilter.kernelnotes.org/unreliable-guides.

The mailing list for serious kernel network development under Linux is called netdev, and is hosted by SGI: netdev@oss.sgi.com. There is also a netfilter mailing list, which is hosted by the SAMBA team and can be found on one of the three netfilter mirrors:

* http://antarctica.penguincomputing.com/~netfilter/

* http://www.samba.org/netfilter/

* http://netfilter.kernelnotes.org

The netfilter core team generally does not answer netfilter help requests that are sent to them directly, so these resources are your best starting point.

Happy hacking!

Paul “Rusty” Russell is the Linux kernel IP packet filter maintainer, and gets to develop cool networky stuff for the Linux kernel. He can be reached at paul.russell@rustcorp.com.au.

Comments are closed.