Keeping the TCP/IP Stream Flowing

In my June column, I gave an overview of IPv4 (Internet Protocol, version 4), and described some common problems with its implementation. This month, I'm going to give you the same kind of information for TCP; the Transmission Control Protocol, which makes up well over 95% of unencrypted traffic on the Internet.

In my June column, I gave an overview of IPv4
(Internet Protocol, version 4), and described some common problems with
its implementation. This month, I’m going to give you the same kind of
information for TCP; the Transmission Control Protocol, which makes up
well over 95% of unencrypted traffic on the Internet.

IP’s mission in life is to get a packet from one point on the
network to another. TCP’s mission is to use IP (hence the name TCP/IP,
for TCP over IP) to provide a reliable stream of data between two
points. While IP is an efficient protocol for transferring large
quantities of information, its specification allows it to drop packets,
deliver them out of order, or even deliver them with corrupt data. TCP
is needed to ensure that all the required packets make it to their
destination, uncorrupted, and in the correct sequence.

After the IP header in the packet, which is almost always 20 bytes
long, we have the TCP header, which is at least 20 bytes long. You can
see what it looks like in Figure 1.


Best Defense 01
Figure 1: An example of a transmission control
protocol (TCP) header.

The most important thing TCP offers is a ‘connection:’ a stream of
related packets. Each packet identifies the connection it is part of,
and where in that connection it belongs. This is
new; IP itself is connectionless, which means that it doesn’t do much
more than deliver packets to a destination: packets can be lost or
delivered out of order. A connection is most useful when we want to send
a large chunk ofdata that is too big for a single packet.

TCP allows for more than one connection between machines. The TCP
header contains two “port numbers.” They identify which connection the
packet belongs to. When a program creates a TCP connection, it gets one
of these port numbers all to itself. When it sends out a TCP packet, the
TCP source port is set to this port number; and when a TCP reply packet
comes in, the TCP destination port should be that same port number.

For example, consider the case of me running Netscape Communicator
on mybox.watchguard.com, and opening two pages
at once from the Web server on bigserver.debian.
. This means two TCP connections between mybox and bigserver will
happen at the same time.

My Netscape browser gets ports 2000 and 2001 for the connections;
the Web server traditionally runs on port 80. So the first two packets
go out from mybox with their destination ports
set to 80, and their source ports set to 2000 and 2001 respectively. The
Web server on bigserver can handle multiple
requests at once, and the replies will have source ports 80, and
destination ports 2000 and 2001 respectively. When the Linux kernel on

mybox receives these packets, it knowsfrom the
destination ports which packet is for which connection.

The important thing to note is that the IP headers of the reply
packets look identical, because the source address-es are both bigserver.debian.org, and the destination addresses
are both mybox.watchguard.com; the TCP header
provides the information needed to distinguish them.

A program can request a specific port, or one will be assigned
automatically. In my example, Netscape was assigned two ports, but the
Web serveron bigserver.debian.org explicitly
asked to use port 80. This is because some port numbers are known as
“well-known ports;” these ports are listed in the file /etc/services on your Linux box. For example, the
“well-known port” for Web servers is port 80; for mail servers it’s port

Usually there are only 3975 ports used for automatic assignment,
numbered 1024 to 4999. If you ever need more connections at once, you
can control the range of ports available using the file



IP packets can easily get out of order; one might take the low road,
the other the high road. Though they’re sent in the right order, IP
packets may arrive at their destination all jumbled up. To keep track of
things, each TCP header has a sequence number, which increases for every
data byte sent. So a packet with sequence number 1000000 comes before a
packet with sequence number 1000200. Using these sequence numbers gives
us a way of ordering packets (and detecting duplicates), however they

Reliability means knowing when something has failed. TCP does this
in two ways: by having a checksum and by insisting that the receiving
computer sends acknowledgements.

The checksum is another number in the TCP header containing the sum
of the entire TCP packet (i.e. the TCP header and the body that contains
the data sent). It’s actually a little more complex than that, but this
is the idea: the remote computer can check that the checksum is correct;
if it isn’t, then it knows that the packet has been corrupted and should
be dropped.

The acknowledgement, or ACK flag,is yet another number in the TCP
header: it is a copy of the sequence number of the packet received. If
no acknowledgement is received, TCP assumes that the packet has been
lost and retransmits it. In practice, there is a fixed spot in the TCP
header for an acknowledgement number (shown in the Figure 1), so
if reply data is being sent anyway, this field is used to avoid sending
a packet just for an ACK.

In the modern Internet, connections range from 960 bytes per second
up to over 200,000,000 bytes per second. The connection points between
networks (called routers) can only be expected to store a certain number
of packets; once they get over-worked, they begin to drop packets.

TCP uses some interesting algorithms to avoid congestion while
providing reasonable data flow. These have names like “slow start,”
“exponential backoff,” and the “Nagle algorithm.” The basic assumption
behind these algorithms is that packet loss is usually caused by
congestion, not by corruption, and that the fewer packets you can send,
the better.

Best Defense 02
Figure 2: A typical TCP connection. Bold lines
indicate data is actually transmitted; the other packets are just
headers with no data.

Establishing a TCP Connection

When Netscape Communicator connects to a Web server, it establishes
a TCP connection that is then used to transmit the request and receive
the response. Of course, Netscape doesn’t know about packets; it just
asks the kernel to make a connection to the server, write the Web
request, and finally, read the response.

Here’s what happens, as shown in the diagram above [Figure

1) On my own machine, kevin.
, Netscape asks the kernel for a TCP
connection. Linux picks the first available port number, in this case

2) Netscape asks the kernel to make a TCP connection to the Web
server (at its “well-known port,” port 80) at http://www.watchguard.com.
To start the connection, my machine sends out a special TCP packet,
called a synchronization (SYN) packet. Linux will assign a sequence
number to my SYN packet. This will be the point of reference for all of
the TCP packets I send to the watchguard.com Web server. In this
example, the sequence number chosen by Linux is 1000000, but it adds an
extra 1 at the end, called the SYN flag. These sequence numbers are
chosen by Linux at random, for reasons that will become clear in the TCP
Spoofing section below.

3) The Watchguard Web server receives the SYN packet, and replies
with its first packet — an acknowledgement (ACK) packet. This packet contains the my sequence number (with the 1 on
the end for my SYN flag), 1000001 as well as the Web server’s own
sequence number, in this case 2000000.

4) My machine receives the reply, and finishes the connection setup
(this is sometimes called a three-way handshake) by sending a third TCP
packet with the ACK flag set but without the SYN flag set. The sequence
number of this packet will be one more than the first packet we sent
(1000001). My computer sends the acknowledgement number back to the Web
server, plus one for the SYN. Making the sequence number 2000001. The
connection has been established.

5) So far, we haven’t sent or received any data. Netscape does a
write() of 885 bytes (the HTTP request). The
kernel sends out a TCP packet of 885 bytes; sequence number 1000001,
acknowledgement number 2000001, ACK flag set.

6) The server issues the first packet of its (large) reply: 1460
bytes of data, in a TCP packet: sequence number 2000001, ACK number
1000886 (1000001 + 885).

We leave our example here. Eventually, one end will close the
connection by sending a packet with the FIN flag set, the other end will
send an ACK for that packet, and that finishes it.

Because I’ve only got a single column, and not a whole magazine to
devote to TCP/IP, I’m going to jump straight in and describe some of the
problems and solutions.

TCP Spoofing

TCP spoofing happens when you send packets while pretending to be
someone else. This is difficult; if you send out packets claiming to
come from someone else, the replies will go to them, not you. Before you
can send data, you need to complete the three-way handshake. You need to
fill in the right acknowledgement number in the third packet, and you
can’t do this if you don’t receive the second packet.

However, old-fashioned TCP implementations used quite predictable
in-itial sequence numbers. This means that you could make one connection
for real, and know that the next initial sequence number would be that
one plus 64000. A TCP spoofer would guess the next sequence number, and
send a TCP SYN packet to the victim from a fake IP address. Obviously,
it won’t receive the TCP SYN/ACK reply, but it can pretend it did, and
complete the third packet in the handshake using its guessed sequence
number. The TCP spoofer can send any data it wanted, and the victim
would believe it was receiving them from the fake IP address. The TCP
spoofer can’t see any reply packets, but sometimes you don’t need

Linux, and other modern operating systems, choose a random number
for the initial sequence number, to make TCP spoofing much more

Linux Blind TCP Spoofing

Unfortunately, Linux kernels before 2.0.36 contained a bug that
allowed you to send data to a program without completing the TCP
handshake. A packet with the FIN flag set, but without the ACK flag set,
and that contained data, could push data through to a program.

Stealth Scanning

If you are looking for TCP servers running on a machine, you can try
to open a connection to each port, from 1 to 65535. Many servers will
log who connected to them, so these “scans” are detected. If you want to
find out what servers are running without actually opening a connection
to each one (to reduce your chance of detection), you can do this in
many ways.

The most common way to do this is to send a TCP finish (FIN) packet;
this is normally used to close an established connection. The TCP
standard says that if a server is waiting for connections on a port
number, the packet must be ignored, but if nothing is waiting on that
port number, a TCP reset (RST) packet should be returned. Hence, you
know there’s a server if you don’t get a reply. People feel that probes
with these FIN packets are less likely to be logged by a packet filter
than SYN packets.

MS Windows disobeys the TCP/IP standard to avoid this, Linux

SYN Flooding

SYN flooding is what is called a denial of service attack: Jamming
up the works so a server becomes unusable. In this case, the target is a
TCP server, such as a Web server or e-mail server. The attacker sends a
flood of SYN packets, which claim to come from a non-existent machine.
The server tries to complete the three-way handshake, replying with
SYN/ACK packets, which goes nowhere. The server then gives up on the

The problem is that each connection to a machine uses up some
memory: if we didn’t have a limit, an attacker could use up all of our
machine’s memory by sending us a few million SYN packets. The usual
limit is five per port; so if there are more than five unanswered SYN
packets, further SYN packets are dropped. So, if you can send five of
these fake SYN packets every 75 seconds (not a very difficult thing to
do), you can keep the server occupied ad infinitum.

The two most common solutions for this are Random Early Drop, and
SYN Cookies. Random Early Drop chooses one of the unanswered SYNs at
random to forget about. In practice it works quite well if you really
are under attack; when a genuine connection attempt comes in, one of the
unanswered SYNs is dropped to make room for it. Most of the time, the one dropped was a fake one,
since they hang around for 75 seconds, whilenormal SYNs receive an
answer and are finished in well under one second.

Linux’ use of SYN Cookies is a moresophisticated technique. We take
all of the information we need to remember about the connection, and
encode it into the sequence number of the reply to the initial SYN
packet. If this is a genuine connection, we’ll get a reply to that
packet (completing the three-way handshake), and its acknowledgement
number will be one more than the sequence number we sent. We subtract
one from this number, and decode it to give the information we need.
This means that the server does not need to remember anything at all, so
no memory is used until the handshake is completed.

Both of these techniques come into play only when a system’s normal
queue is about to overflow; either it is under a heavy load or under a
SYN attack. You can use the proc file:


to turn SYN cookies on (if your kernel was compiled with “SYN flood
protection”), and the file:


to control the number of unreplied SYNs remembered before activating
SYN cookies or dropping SYNs (depending on whether SYN cookies is turned


SACK stands for Selective Acknowledgement. The acknowledgement
number in the TCP header reflects the last continuous sequence of
packets. If a packet with SYN number 1000886 is lost, for example, this
acknowledgement number will stay on 1000886, even if other packets
further along are received. The sender notices that the receiver’s
acknowledgements aren’t incrementing, and so retransmits.

Unfortunately, on fast connections with high delay, such as an
intercontinental link in the Internet, multiple packets may be lost, and
the algorithms developed to handle occasional packet loss don’t work as
well. This is where the recent generation of operating systems, like
Linux 2.2 and all the free BSD variants, do SACK.

The TCP header allows “options:” up to 40 bytes of extended
information after the normal header. Unlike IP’s options, these are
frequently used, and one of the possible options is SACK. This is a set
of one or more ranges, indicating the sequence numbers of packets that
have been received. If both host machines understand SACK, they can be
much more clever about how they retransmit lost packets. Expect things
to improve as deployments of SACK-enabled clients and servers

So there are the basics of TCP/IP andhow to avoid the most common
nasty tricks. Happy Linuxing!

Paul “Rusty” Russell is the Linux kernel IP packet filter
maintainer. He works for WatchGuard and can be reached at

Comments are closed.