More Network Programming

Last month we started a discussion on network programming. However, in the interest of getting through an entire example of a client and a server and how they communicate, we omitted many details. This month, we'll examine our examples more closely to gain more knowledge about network programming. Specifically, we will discuss how to get IP addresses from hostnames and hostnames from IP addresses. We will also take a look at the difference between little-endian and big-endian machines and find out why "endianness" matters in network programming.

Last month we started a discussion on network programming. However, in the interest of getting through an entire example of a client and a server and how they communicate, we omitted many details. This month, we’ll examine our examples more closely to gain more knowledge about network programming. Specifically, we will discuss how to get IP addresses from hostnames and hostnames from IP addresses. We will also take a look at the difference between little-endian and big-endian machines and find out why “endianness” matters in network programming.

Names into Numbers

In our example of a client last month, we manually specified the IP address of the server we connected to. This is impractical because it is unlikely that you will know the IP address of every server that you want to communicate with. However, you will often know the name of the machine. For example, when asked where the Linux Magazine Web site can be found, you’d probably say “http://www.linux-mag.com” instead of “http://209.81.9.15/.” Both will get you to the same place, but the domain name is easier for people to remember because it has some intrinsic meaning. An IP address is just a list of numbers that gives you no hints about what’s on the other side. In order to connect to another machine, though, you need to know its IP address.

Due to the sheer size of the Internet, mapping individual domain names to specific IP addresses is probably a job best left to a computer. As I’m sure most of you are already aware, Domain Name System (DNS) servers exist solely to perform this function.

The next question then becomes, “How do we use the services provided by DNS within our own programs?” The answer is really quite simple. All we need to do is utilize the gethostbyname() function. Let’s take a look at its prototype, found in <netdb.h>:

struct hostent* gethostbyname (constchar* name);

This function simply takes the domain name of a machine and returns a pointer to a hostent structure, which will contain all the information you will need to connect to this machine. On failure, it will return NULL. The hostent structure is defined in <netdb.h>:


struct hostent
{
char *h_name; /*officialnameofhost*/
char **h_aliases; /* alias list */
int h_addrtype; /* host address type */
int h_length; /* length of address */
char **h_addr_list; /* list of addresses */
};

The field we are most interested in is the h_addr_list field. It is a zero-terminated array of all of the IP addresses for a given host. (Note that a single host can be associated with many IP addresses. This is often used for load balancing.) Each element in this array is a pointer to another array containing 4 bytes. Each byte represents one of the four octets that comprise an IP address.




Figure One: A Simple Example of gethostbyname()


#include <netdb.h>

int main ()
{
struct hostent* host;
host = gethostbyname (“www.linux-mag.com”);
if (host)
{
printf (“%d.%d.%d.%d\n”, (unsigned char)host->h_addr_list[0][0],
(unsigned char)host->h_addr_list[0][1],
(unsigned char)host->h_addr_list[0][2],
(unsigned char)host->h_addr_list[0][3]);
}
else
printf (“Error in looking up host.\n”);
}

Figure One shows a piece of code that demonstrates how to traverse h_addr_list. This small program calls gethostbyname() to get the IP address of http://www.linux-mag.com/.


machine:~/> ./a.out
209.81.9.15
machine:~/>

The next step is being able to use this IP address to connect to a server. Last month, we used the following method to place the IP address in the correct field:


inet_aton (“209.81.9.15″, &addr.sin_addr);

Notice that the function inet_aton(), takes an IP address argument as a string in dotted-number-notation, but gethostbyname() gives us IP addresses in h_addr_list as an array of bytes. Fortunately, this is not a problem, because it is possible to copy these bytes directly into addr. sin_addr as follows:


memcpy(&addr.sin_addr.s_addr, &host.h_addr_list[0][0], 4);

Sometimes all you have to work with is an IP address. DNS is also capable of mapping IP addresses to domain names. This is accomplished by usinggethostbyaddr(), the sister function of gethostbyname(). Its prototype can be found in <netdb.h>) and it looks like:


struct hostent* gethostbyaddr (constchar*addr,intlen,inttype);




Figure Two: A Simple Example of gethostbyaddr()

#include <netdb.h>
#include <sys/socket.h>
#include <resolv.h>
#include <sys/types.h>

int main ()
{
int i;
struct hostent *host;
struct in_addr addr;
inet_aton (“209.81.9.15″, &addr);
host = gethostbyaddr (&addr, sizeof(addr), PF_INET);
if (host)
{
printf(“%s\n”, host->h_name);
}
else
printf (“Error in looking up host.”);
}

Figure Two shows gethostbyaddr() in action. It tries to find the domain name that belongs to the IP address 209.81.9.5. Notice that the function requires an address in the same form as the connect() function discussed last month. Therefore, we must pass the IP address in string form to inet_aton(), which will convert it for gethostbyaddr(). The output of this simple program is:


machine:~/> ./a.out
www.linux-mag.com
machine:~/>

Communication Between Different Types of Machines

Now that we know how to find the names and addresses of the machines we want to send our data to, it’s worth taking a minute to consider how different machines interpret and store the raw bits they receive. Basically, nearly all modern computer architectures can be split into two classifications — those that are big-endian and those that are little-endian. The endianness of a machine refers to the way numbers are stored in the computer’s memory. This is best demonstrated by an example.




Figure Three: Big-Endian Versus Little-Endian

The 4-Byte Integer 134480385 (aka 0×08040201) on a Big-Endian Machine:

Address    0      1      2      3
Value 0×08 0×04 0×02 0×01

The Same 4-Byte Integer on a Little-Endian Machine:

Address    0      1      2      3
Value 0×01 0×02 0×04 0×08


Figure Three shows the big-endian and little-endian representations of the number 134480385 (or 0×08040201 in hexadecimal). Notice how the little-endian machine has the bytes exactly reversed from the big-end-ian machine. This is because little-endian machines decide to store the “little” end of numbers in the first addresses, and the big-endian machines store the “big” end of numbers in the first addresses.




Figure Four: A Program to Differentiate Big- and Little-Endian Machines


int main ()
{
int x = 134480385;
int *p = &x;
int i;

for (i = 0; i < 4; i++)
printf (“0x%08X %d\n”, (char*)p + i, *((char*)p + i));
}

For further evidence, see Figure Four, which illustrates a sample program that behaves differently, depending on the endianness of the machine it is running on. When run on a little-endian machine, it outputs:


0xBFFFF794 1
0xBFFFF795 2
0xBFFFF796 4
0xBFFFF797 8

but when run on a big-endian machine, it outputs:


0xEFFFFC6C 8
0xEFFFFC6D 4
0xEFFFFC6E 2
0xEFFFFC6F 1

where the first column is the memory address, and the second column is the byte stored there.

This presents a two-fold problem for programmers. First, you must take precautions to ensure that your code behaves identically on machines of either endianness. As Figure Four demonstrates, this is easy to mess up. Another issue is that care must be taken to ensure that data being transferred from one computer to another is not corrupted.

Try to imagine what would happen if a little-endian machine were to send the number 134480385 (0×08040201) to a big-endian machine that interpreted it as the number 1690320 (0×01020408). Imagine that happening over and over again during a long network transfer, and imagine how screwed up your data would become in the process.

Fortunately, this problem is rather easy to manage as there are functions that address the problem of doing endian conversions correctly and portably. Their function prototypes can be found in <netinent/in.h>:


unsigned long int htonl
(unsigned long int hostlong);
unsigned short int htons
(unsigned short int hostshort);
unsigned long int ntohl
(unsigned long int netlong);
unsigned short int ntohs
(unsigned short int netshort);

The names of these functions stand for, “Host To Network Long,” “Host To Network Short,” “Network To Host Long,” and “Network To Host Short,” respectively. Network byte-ordering is the standard for all machines that wish to communicate on the network. Network byte-ordering also happens to be the same as big-endian byte-ordering. The host byte-ordering can be either big- or little-endian, depending on the CPU architecture of the computer that the program is running on.

The idea is to do a host to network conversion when you send data and a network to host conversion when you receive data. By using these functions consistently throughout your networking code, your programs will behave correctly on both little-endian and big-endian machines.




Figure Five: An Example Program to Demonstrate htonl()


#include <netinet/in.h>

int main ()
{
int i = 1;
printf (“%d %d\n”, i, htonl(i));
}

Figure Five actually shows us the differing behavior of one of these functions on a big-endian and a little-endian machine. On my little-endian machine, the output of this program is the following:


1 16777216

because the bytes had to be reversed before sending them over the network. However, on big-endian machines, htonl does nothing:


1 1

Network byte-ordering and the host’s byte-ordering are the same on big-endian machines. Therefore, there is no need for a conversion.

That does not mean that there’s no need to use these functions if you develop primarily for big-endian machines. Thanks to Intel, there are a lot of little-endian x86 machines out there, and it would be a shame if your code didn’t work on them. For those of you concerned with efficiency, relax — there is no speed hit at runtime if you use these functions on a big-endian machine. These functions are reduced to preprocessor macros that do literally nothing. Therefore, there is nothing to lose by using these functions, and there is much to be gained in terms of portability; so please, use them in your projects.

Moving Right Along…

Hopefully by now you should be able to do some interesting network programming on your own, but as always, there are still many other details left to cover. As mentioned in last month’s column, to get a comprehensive guide of network programming, you should read Unix Network Programming Volume I: Networking APIs: Sockets and XTI by W. Richard Stevens.

Next month, we will continue our discussion of network programming; however, rather than discussing the interface functions, we will take a higher-level look at the different protocols used to communicate between machines. With that information, you should be able to make effective decisions about which methods of communication are best for your applications. In the meantime, happy hacking!



Benjamin Chelf is an author and engineer at CodeSourcery, LLC. He can be reached at chelf@codesourcery.com.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62