Seems, I never wrote nothing more stupid.
Seems, I never wrote nothing more stupid.
I hope Gods will forgive me, but I cannot forgive myself 8)
We’ve seen in previous articles that sometimes large IP packets are broken up along the way into “fragments.” Ordinarily, these fragments aren’t reassembled into a packet until they reach their final destination. This causes difficulties for masquerading and transparent proxying, which need to see the whole packet to figure out what to do with it.
Hence, the kernel option CONFIG_ IP_ALWAYS_DEFRAG which defragments all packets, even the ones just passing through. This only alters two pieces of code in a single file, by moving eight lines, and both Red Hat and Debian ship kernels with this enabled for the sake of masquerading.
But consider a network where you have two separate connections to the outside world. One is the router of your choice while the other is your Linux box, set to perform packet defragmenting for you. Now imagine a fragmented packet where half the fragments pass through the Linux box, and half come in the other way. You’ll never get a complete packet, since the Linux box is holding back its half of the fragments waiting for a complete Packet to pass through. This is a fairly unusual scenario, but it can and does happen. As a workaround, kernel guru David Miller hacked a special setting into the RedHat 6.0 kernel to correct this problem for people who happen to be using transparent proxying or masquerading. David’s effort is appreciated, but this hack was too ugly to make it into Linus’ kernel.
Linux 2.2: A User Perspective
One of the problems with the ipchains HOWTO and the mailing list is that we get a number of questions on transparent proxying and masquerading, even though it really has nothing to do with packet filtering.
For want of a better place to put it, the ipfwadm packet filtering infrastructure for Linux 2.0 was used to specify what packets were to be masqueraded or transparently proxied. ipchains, for Linux 2.2 inherited this role, and the distinction I usually make between packet filtering (dropping unwanted packets) and packet mangling (altering packets as they pass) is blurred if people have only experienced them in ipchains or ipfwadm, where you have to use the packet filtering rules to do IP masquerading. This masquerading/packet filtering crossover leads to a significant amount of confusion for users.
In a perfect world, your packet filtering rules should be unaffected by whether or not you’re masquerading. By making both functions independent of each other, we can have separate HOWTOs, separate manpages, separate mailing lists, and even separate maintainers.
The other problem that confounds users is that there are various forms of Network Address Translation, and many tie directly in with others. For example, masquerading, transparent proxying, and port forwarding belong together. Giving them separate names, and using different tools to control them only adds to the confusion.
All of these concerns made it obvious that it was time for an overhaul. I wanted to rewrite packet filtering for 2.4, to make it faster and more flexible.
Since the early days of ipchains, many people have reported problems to the mailing list, some of which don’t have easy answers, and some of which are the same issues week after week.
Confusion leads to lack of security. Requiring that people master masquerading to do packet filtering increases the learning curve: it becomes more likely that people will skip an important step and make a mistake. Making things simpler for the users is a security issue. My most important goal: make packet filtering easier to configure, harder to misconfigure.
Providing a Solution: netfilter
The key to solving these problems is to create an organized infrastructure inside the kernel for writing extensions such as masquerading or packet filtering. This project is called netfilter, and on top of netfilter goes a new masquerading and Network Address Translation module, a new state-tracking module, and a new packet filtering module. Each of these is extensible as well, so new features can be added simply by inserting a new kernel module (there’s no need for a reboot).
There’s a HOWTO that covers using the built-in netfilter modules, as well as writing new modules. One of the benefits of having a documented framework is that it’s easier for people to make additions. As an example, Ronald Kuetemeier posted recently on the development list:
“[netfilter's] design enables me to do just that with a minimum of kernel hacking but with the flexibility I need. I just thought it was cool implementing Samba load-balancing with fail-over in a few hours.”
By the time you read this, the backwards-compatibility layer of netfilter will be fully in place. This allows users to be blissfully ignorant of the kernel changes, and with a simple modprobe ipchains.o command or modprobe ipfwadm.o command, use the same commands to control packet filtering that they used in for 2.2 or 2.0 kernels.
Better Packet Filtering
There are many user-visible improvements in the netfilter world, but since this is a security column, I’ll concentrate on the advances in packet filtering. The new packet filtering control program is called iptables, and is the third generation of Linux packet filtering, descending directly from ipchains and ipfwadm.
|Figure 1: Linux 2.2: Packet filtering with ipchains.|
In Linux 2.2, a packet passes through the packet filtering code as shown in Figure 1. The circles represent the packet filter checks (in Linux 2.2 parlance, a series of packet filter checks is called a “chain.”) First the packet gets checked against the “input” chain. If it passes, it goes on to be routed: this decides if it’s a local packet,or if it should pass through the box. If it’s a local packet, it’s delivered to the target machine. Otherwise, it’s checked against the “forward” chain. If that checks out, it passes through the “output” chain before being sent on. Similarly, if a program sends a packet out, it goes through the “output” chain before leaving.
Compare this with Figure 2. The input and output chains have moved: the input chain is only examined if the packet is really destined for us, not for packets that are merely passing through. The output chain is only used for locally-generated packets.
|Figure 2: Linux 2.4: Packet filtering with iptables.|
This change means that every packet passes through one, and only one chain. This is obviously more efficient, but it also turns out to be much simpler to use. If you want to control packets passing through your box, that means you want to put more checks in the “forward” chain. If you want to control packets coming into your box itself, you use the input chain. To emphasize this change, the chain names in iptables are upper case, ie. INPUT, FORWARD and OUTPUT.
You’ll quickly notice when you use iptables that transparent proxying and masquerading are not mentioned anywhere. They are now treated as they should be — as completely separate issues, controlled by the ipnatctl program. This makes iptables much easier to use than ipchains.
The third change is that iptables is extensible: you can write new modules to give it new options. Many examples are included, some of which are direct replacements for options available in ipchains. In fact, iptables itself is rather feature-poor; its power comes from these additional modules. For example, a logging module (LOG) logs the packet when called and in a much more readable form than ipchains’s logging. LOG allows users to specify logging level, whether it is rate-limited to avoid log flooding, and give an optional 14 character prefix for the message to identify it in the system logs.
Finally, a rule in the FORWARD chain can examine the name of the interface the packet came from, as well as the interface it is heading to. With ipchains, only the output interface name is available, so if you wanted to know where a packet came from, you had to examine its source address and use your knowledge of the network layout to figure what interface it must have entered through. This is the kind of that evil confusion that we’re trying to stamp out.
With iptables’ extensible interface, nobody pays a penalty if they choose not to use a particular module, so some features that were too specific to add directly into the kernel, can now be made available to whomever wants them in module form.
This project has been my most ambitious Linux Kernel project to date. I thank WatchGuard for sponsoring me.I also have to thank key Linux network developers, especially David S. Miller. And of course, those who test, debug, worked, and are working on the networking code now. Receiving a bug report or a patch from someone I’ve never met still gives me a huge thrill. Keep them coming.
Paul “Rusty” Russell develops cool new GPL networky stuff for the Linux Kernel for WatchGuard. He can be reached at firstname.lastname@example.org.