dcsimg




The Insides of Networking

In the past two months, this column has introduced some of the functions necessary for writing networked programs. We've been throwing around terms like TCP, UDP, IP, and others without any real description of what they mean. This month, you should gain a better understanding of these abbreviations and exactly what is going on when they are used to communicate between machines. With this knowledge, you will be better equipped to determine exactly which protocol is appropriate for your applications -- for instance, why UDP is often used for broadcast-style transmissions, while TCP is used for transactions.

In the past two months, this column has introduced some of the functions necessary for writing networked programs. We’ve been throwing around terms like TCP, UDP, IP, and others without any real description of what they mean. This month, you should gain a better understanding of these abbreviations and exactly what is going on when they are used to communicate between machines. With this knowledge, you will be better equipped to determine exactly which protocol is appropriate for your applications — for instance, why UDP is often used for broadcast-style transmissions, while TCP is used for transactions.

Before jumping into the different protocols that make up network communication, let’s take a look at how a protocol actually becomes a standard. The documents describing the various network protocols are called RFCs (Request for Comments). RFCs were started in 1969 as a set of documents to describe the various protocols pertaining to the Internet. It is easy to find RFCs for different protocols online at http://www.rfc-editor.org/.

RFC 791 describes the Internet Protocol (IP), RFC 768 describes the User Datagram Protocol (UDP), and RFC 793 describes the Transmission Control Protocol (TCP). RFCs are used to describe many higher-level protocols that you are probably more familiar with. For example, FTP is described in RFC 959, HTTP 1.0 is described in RFC 1945, and the telnet protocol is described in RFC 854.

When writing an application like an FTP client or server, the RFC describing the protocol should be the first place you look. It outlines exactly what each side of the communication channel expects to see from the other channel and the format of all communication.

RFCs are great for gaining a general understanding of protocols like the ones we are looking at in this article. One thing that RFCs don’t do is describe how to implement TCP/IP in a specific operating system. However, they can definitely help you gain further understanding of the protocols themselves.

It would be hard to use Linux and not be familiar with the term TCP/IP. But what exactly is TCP/IP, and why is it so important to networking? TCP/IP actually refers to a whole suite of communication protocols, including, but not limited to, TCP and IP. In this article, we will look at IP, TCP, and UDP.

Let’s start by examining the Internet Protocol (IP) (ftp://ftp.isi.edu/in-notes/rfc791.txt). This is the basic protocol used to communicate information between two different nodes on a network. As you probably know, each node on the network is required to have an “IP address” that allows other nodes to communicate with it.

The Internet Protocol RFC describes exactly how these addresses should be assigned and formed. It also describes exactly how data should be packaged and sent to another host on the network.

Information is transferred between two nodes in a network in blocks of data that are called packets. However, packets contain more information than just the data you wish to communicate with another machine. They also contain information indicating from where the data is coming, to where the data is going, as well as other identifying information. This is used by routers, which transfer the data on its route from the source machine to the destination machine. It is this additional information that is described in the various protocols in this column.








feedback



       page 1 2 3 4   next >>

 

Linux Magazine /
December 2001 / COMPILE TIME
The Insides of Networking





Three Essential Protocols








Figure One
Figure One: Example Internet Protocol (IP) datagram as described in RFC 791.

The Internet Protocol is the base protocol for communication upon which other protocols build their packets (e.g., TCP and UDP). It includes the IP header and the data to be transferred. Other, lower-level, protocols add information to the packet and allow it to be handled by the hardware (these are beyond the scope of this article). An example IP datagram is shown in Figure One (taken directly from RFC 791).

Let’s take a look at some of the protocol’s fields in depth. The Total Length field represents the length of the datagram (which can be sent as one or more packets) in octets. An octet is eight bits. Therefore, this datagram has a length of 21 (there are a total of 168 bits in it). The Identification field contains a value that can allow the receiver of the datagram to reassemble the data in the event that it was split into multiple packets (which are also called fragments).

The Time field indicates for how long the datagram is valid. (The value is measured in the number of hops between routers, not in seconds or other traditional measurements of time.) For example, some datagrams may only contain useful data for a few hops. If the receiver does not receive the data by then, the data becomes useless. This field allows that limit to be set. If the datagram doesn’t arrive in time, it will be discarded by a router.

The header checksum is a field that performs a sanity check on the data in the datagram. At send time, a calculation is performed on all the fields in the datagram, and the result is placed in that field. The receiver can then perform that same calculation and compare the results to assure that the data was not corrupted in transmission.

The source address and destination address contain the IP addresses of the source and destination hosts, respectively. Finally, the actual data comes after all the header fields in the IP datagram.

With an IP datagram that contains the information necessary for the communication to take place between two nodes, you might think that you can simply stuff the correct data in the data field and send off the packet. However, it isn’t quite that simple. You still need a method of indicating which application on the machine should receive the incoming packet.

In our discussion two months ago on network programming, we examined how ports of communication enabled the server to decide where all of the incoming network communication should be directed. Well, this information must be contained somewhere in the packet so that the server can direct its traffic. This is handled by another protocol, like UDP or TCP.








feedback



<< prev   page 1 2 3 4   next >>

 

Linux Magazine /
December 2001 / COMPILE TIME
The Insides of Networking





Adding Another Layer








Figure Two
Figure Two: Example User Datagram Protocol (UDP) message as described in RFC 768.

The User Datagram Protocol (UDP) is the simplest of the protocols that can be used to provide this port information, and it assumes the IP protocol below it. Essentially, this means the UDP message will appear in the data field of the IP datagram. Once the packet has finally reached its proper destination, the IP headers in the packet will no longer be useful, and the UDP headers and data will be used. Figure Two shows a UDP message that is taken from RFC 768.








Figure Three
Figure Three: An IP diagram with the UDP message in the data field.

Notice that the UDP message only contains two ports (the Source Port and the Destination Port) and the checksum and length. Basically, this is the minimum amount of information that is necessary for two machines to communicate data with each other and to keep track of precisely which applications are communicating with each other. Figure Three shows an example of an IP datagram with the data field including the UDP fields.

The discussion of network programming in the last two columns did not address individual datagrams of data. In our discussions, the program simply sent the data on the port, and it was received at the other end. But what happens if one of the packets gets lost or doesn’t arrive when it is supposed to? How can we ensure that this will not happen? Well, the User Datagram Protocol does not provide any guarantee that data will arrive at a destination from its source. However, the Transmission Control Protocol (TCP) does provide this guarantee.

The key difference between TCP and UDP is that TCP provides a means for a connection between two hosts. The hosts establish communication and then send messages to each other. Each message is expected by the host and is ordered in such a fashion that the receiver can tell if something is missing. UDP provides no such facility. As a result, UDP is often referred to as providing a connectionless service, whereas TCP provides connections between clients and servers. However, UDP has significant speed gains over TCP because it does not need to carry the overhead that is necessary for providing a connection and reliable communication. Because TCP provides reliability and a connection between two hosts, this column used it in the client/server examples of previous months.








feedback



<< prev   page 1 2 3 4   next >>

 

Linux Magazine /
December 2001 / COMPILE TIME
The Insides of Networking














Figure Four
Figure Four: Example Transmission Control Protocol (TCP) message as described in RFC 793.

Figure Four shows a sample message for the Transmission Control Protocol. Like UDP, this protocol assumes the IP fields are already in the packet. Notice all the fields that are in the TCP message that were not present in the UDP message. However, the source and destination ports, as well as a checksum field, are still present. Among other things, the additional fields provide support for guaranteeing the arrival of data.

Basically, each TCP message is given a Sequence Number. That number indicates the order in which the messages should be received by the destination host. (Note that the actual order in which they are received may be different.) The Acknowledgment Number is the sequence number that the sender expects to see in the next message from the other host. So once a connection is established, both hosts have a number from the other host that they can use to make sure that whatever message is received is the message that was expected. If it is not, the receiver can request the missing packet be resent.

Go Forth and Communicate

Obviously, there are many other details of the protocols discussed here, as well as many other protocols. For more information on these protocols and others, check out Internet Core Protocols: The Definitive Guide by Eric A. Hall and Vinton G. Cerf. This book gives a comprehensive description of TCP, UDP, and IP, as well as other protocols essential for network communication.

Armed with the information that we have presented here, you should now have a much better sense of how information is organized and transferred between various machines over a network.


by Benjamin Chelf








feedback



<< prev   page 1 2 3 4       

 

Linux Magazine /
December 2001 / COMPILE TIME
The Insides of Networking







Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62