In case you missed it, last month we looked at software packages and package managers, which are tools that make it fairly easy to install new applications. However, many great applications are distributed as source code, which makes them somewhat more difficult to install. So this month we're going to look at source code and the tools used to work with it. Next month we'll take this a step further and actually go through the process of installing some software distributed as source code.
|Figure One: Examples of source code and machine language.|
In case you missed it, last month we looked at software packages and package managers, which are tools that make it fairly easy to install new applications. However, many great applications are distributed as source code, which makes them somewhat more difficult to install. So this month we’re going to look at source code and the tools used to work with it. Next month we’ll take this a step further and actually go through the process of installing some software distributed as source code.
Stuck in the tar Pit
Okay, we’ve all had this experience. You find a really spiffy program on the Net and download the file xpilot-4.2.1.tar.gz, which contains it. Then you try to install it by using GNOME RPM, as described in last month’s article. But, a less-than-friendly dialog box pops up announcing, “xpilot-4.2.1.tar.gz doesn’t appear to be a RPM package.” Hey, what gives? How are you supposed to install this thing?
Your problem is that you’ve downloaded a source release rather than a binary release. Now, if you’re lucky, somewhere on the Net you may find an RPM or DEB package that contains this program. But if you can’t find one, you have only two alternatives: figure out how to install the source release or forget the whole thing and try some other program instead.
What is Source Code?
A computer program is a listing of instructions that tell a computer how to perform some specific task. Programmers often create these lists of instructions using programming languages such as C, C++, Java, Perl, PHP, or Python. The list of instructions written in one of those languages is referred to as “source code.”
Now here’s the tricky part — as you may know, computers can only understand a language consisting of ones and zeros. This is referred to as “machine language.” Machine languages are highly specific to a given type of machine. For example, the machine language for the Intel chip in an IBM-compatible PC is very different from that of the Motorola processor in an Apple Macintosh. That’s why IBM programs won’t run on a Mac, and vice versa.
Right about now you might be saying: “Wait a minute, if computers understand only machine language, why do programmers write source code rather than machine language?” Well, when computers were first invented over 50 years ago, programmers did write machine language. But, writing a program in nothing but ones and zeros proved tedious and error-prone, so computer scientists invented “higher level” programming languages which allowed them to write programs as source code rather than ones and zeros. Source code is a sort of middle ground between human-understandable languages (like English) and machine understandable languages (like ones and zeros).
The difference between English and programming languages such as C or C++ is that those languages were designed to be mechanically translatable into the machine language of ones and zeros. Figure One illustrates these two forms — source code and machine language.
|Figure Two: The compilation process.|
So how does a program go from being a listing of source code written in C or C++ to a mass of ones and zeros? Computer scientists invented a process called compilation to perform this task. Figure Two presents a brief overview of the compilation process.
Special programs called a compiler and linker translate the source code written by the programmer into machine code that a computer can execute. Figure Two depicts the two-step compilation process associated with the popular C programming language. Some programming languages — for instance, Java — organize the process somewhat differently.
In the first step of the compilation process, a program known as a compiler translates source code into a form known as object code. Some programmers refer to object code as intermediate code because object code has a form halfway between source code and machine code. Object code, like machine code, is generally specific to a particular computer model. In this intermediate stage, a program will often be broken up into many small parts stored as object code.
This brings us to the second step in the compilation process, called linking. In linking, a program known as a linker combines program components that are stored as object code into a single machine language program ready for execution.
If you look back to Figure Two, you’ll see that we’ve not yet mentioned the two boxes labeled header files and libraries. What are those? Like a pastry chef who uses box mixes and decorates cakes with ready-made rosettes, a programmer generally doesn’t write programs completely from scratch. Instead, they will often use ready-made components that are available in two forms:
* header files, which are reusable source code components;
* libraries, which are reusable object code components.
As with everything else in Linux, there is always more to explore here. We could go on to study static and dynamic libraries, source RPMs, and other miscellany. But doing that would divert us from our goal of learning how to install source releases. So let’s move on to learn about tools for working with source code (or just “code” for short).
Programmers use what seems to be a myriad of tools for working with code. However, like a carpenter who owns dozens of tools, but regularly uses only a hammer, saw, and T-square, most programmers use a few tools often and other tools only occasionally. We’ll take a first look at the most important tools this month and next month we’ll look at how to use the tools to actually install a source release.
Following is a list of some of the tools used to manipulate source code. All of these tools are included with your Linux distribution, but you will probably never work directly with some of them (unless, of course, you decide to become a programmer).
The Assembler (as): When writing low-level system code, programmers sometimes use assembly language, a special type of programming language that is much closer to machine language than ordinary source code. Using assembly language, programmers can write much more efficient code. The problem is that assembly language is a tedious and difficult language (it’s one step removed from the ones and zeros of machine language). A program called an assembler translates assembly language code to machine code. You probably won’t ever work with the assembler directly — it’s usually called into service by other programs in the compilation chain.
C Compiler and C Preprocessor (gcc and cpp): The C compiler gcc translates C source code into object code. A program can be written in languages other than C, but C is the most popular Linux programming language. The C preprocessor cpp tweaks C source code before the source code is processed by the C compiler. The preprocessor makes it easier for programmers to write portable, configurable programs.
Utilities for Building Programs (make, imake, and xmkmf): When a program consists of many parts, the process of building the program becomes complex. The make utility organizes the commands needed to build the program in a single file — a makefile — so that the program can be built more conveniently. Sometimes, a program is sufficiently complex that it’s even tedious or difficult for a programmer to build the makefile. In such a case, the programmer can use imake or xmkmf, which generate a makefile based on specifications supplied by the programmer.
Utilities for Working with Object Code and Machine Code (ar, ld, and strip): The archive utility ar serves as a library manager, letting programmers build and maintain collections of reusable object code. The linker ld combines object code components and converts them into executable form. The strip utility strip removes extraneous information from object files, so that they can be stored as efficiently as possible. These are more tools that you’ll likely never work with directly.
Okay, I admit it: The code tools we just talked about are somewhat exotic beasts. Not the sort of pets best suited to everyone’s family room. But, even if you’re not a programmer, there are several tools for working with code that you’ll use frequently. These tools work with most any kind of file, not merely wads of source code. Programs like tar, gzip, and zip are essential for dealing with source code files, but they’re also very handy for everyday tasks.
Unpacking a File
When you download a file, you often get an archive that contains several –perhaps many — files within it. Archive files can also contain directories, which can contain files and subdirectories. Generally, these archive files have names ending in .tar, .tgz, .tar.gz or .zip.
To get at the files contained in the archives, you must unpack them using some command-line incantations. Fortunately, these particular incantations are not really all that difficult to master.
If the file is a .tar file, open a terminal window, move to the directory containing the .tar file, and issue a command of the form:
where filename.tar is the name of the .tar file. For example, to unpack the file stuff.tar, issue the command:
The command will unpack the contained files to your current directory. You can use a GUI file manager or the ls command to view the results of your work.
If the packed file is a .tgz or .tar. gz file, it has been archived and then compressed. So, the previous command, which merely unarchives the file, won’t work. To unpack a .tgz or .tar.gz file, open a terminal window, move to the directory containing the .tgz or .tar.gz file, and issue a command of the form:
where myfile is the name of the .tgz or .tar.gz file. The command will expand the compressed file and then unpack the archived files to your current directory.
If the packed file is a .zip file, you shouldn’t issue a tar command at all. Instead, open a terminal window, move to the directory containing the compressed file, and issue a command of the form:
where myfile.zip is the name of the .zip file you want to decompress. Like the command given earlier, this command also unpacks the contained files to your current directory.
As you can imagine, it’s much more convenient to send a single archive than a mess of files. To create a .tar file, open a terminal window, move to the directory containing the files or directories you want to pack, and issue a command of the form:
tar cvf myfile.tar file1 file2 …
where myfile.tar is the name of the .tar file you want to create and file1 and file2 are files or directories you want to archive. The ellipses (…) indicate that you can list as many files and directories as you have the inclination to list; just separate each entry from the previous entry with a space. If you want to compress the packed file, simply include the z flag on the command line:
tar zcvf myfile.tgz file1 file2 …
The syntax for the tar command can get a little tricky at times. Make sure to check out the man page (type man tar at the command line) if you run into any kind of trouble.
A potential advantage of a .zip file is that it’s easier to use under MS Windows than a .tar file, because familiar MS Windows programs use the .zip format. To create a .zip file, open a terminal window, move to the directory containing the files or directories you want to pack, and issue a command of the form:
zip myfile.zip file1 file2 …
wheremyfile.zip is the name of the .zip file you want to create and file1 and file2 are files or directories you want to pack. As in the .tar command, the ellipses (…) indicate that you can list as many files and directories as you want.
Now that you’ve learned how to pack and compress files, wouldn’t this be a good time to archive these b’zillion .JPG files you’ve got lying around? Just pack and compress the files and put an end to hard drive clutter. Then, you can conveniently email your photos to all your outwardly interested friends. If your friends use MS Windows, be sure to send a .zip file rather than a .tar file.
Okay, this month we poured the foundation on top of which we’ll build next month, when we’ll look at how to install a source release. As our example, we’ll use the multi-player space combat game xpilot. So, be sure you’ve got the X Window System installed and ready to go. Until then, have fun archiving files!
Straight from the Source’s Mouth
Believe it or not, programmers don’t release their programs as source code rather than in binary form because they are lazy, hostile or malicious. Source releases have several advantages over binary releases:
* Source code can be made portable: Although binary code works for only one platform, you can often compile source code on a wide range of platforms.
* Source code can be tailored: Binary code is take it or leave it. On the other hand, source code can be configured to enable only those features and options you need, so programs run lean and mean.
* Source code can be modified: If you don’t like the way a program distributed in source form works, you can change it — provided you have the proper programming skills.
* Source code can be verified: Binary code can harbor hostile elements, such as a virus. You’re much less likely to contract a virus from source code than binary code. And, if you do contract a virus from source code, you’ll have the source code of the virus, which can help a virus expert determine the scope of the infection and how to combat it.
* Source code is convenient for the developer: Okay, some developers probably do distribute source code primarily because it’s convenient for them. But hey, they’re also providing you with great free software. What’s to complain about?
If all these reasons fail to win you over, consider this: What good is open source without the source?
Bill McCarty is an associate professor of information technology at Azusa Pacific University, Azusa, CA. He can be reached at firstname.lastname@example.org.