Subversion 101

The new open source version control promises to obsolete CVS.

So, it’s happened again. Once more CVS has corrupted your repository, leaving you twitching and wrenching violently on the floor of your cubicle. You struggle to cry out, but all of your strength’s been sapped by hours of needless frustration.

Well, perhaps that’s a bit of an overdramatization… but not by much. While CVS is a common tool for open source development, it’s not necessarily a popular one. Yes, it’s free and it does it’s job, but that’s about it. Clunky, limiting, and at times downright dangerous, it’s a necessary evil.

Or rather, was a necessary evil. Now, there’s an alternative.

In late 2000, a group of developers decided to create something better than CVS and thus Subversion (http://subversion.tigris.org) was born. Designed from the ground up to be “a compelling replacement for CVS,” Subversion 1.0 has just been released. Let’s take a look at Subversion, compare it to CVS, learn the most common Subversion commands, and see how to migrate a CVS repository to Subversion.

Why Revision Control?

A revision control system (RCS) is a fundamental tool for large software projects involving multiple developers. Whether you use an expensive commercial package or Linus Torvalds’ inbox (not recommended unless you are Linus Torvalds), a system for combining and organizing code contributions from every developer in the team is hard to do without.

To this end, the most basic feature of all revision control systems is the ability to track and store multiple versions of a set of files over time.

Since few projects have the foresight to plan out the entire course of their development to the last detail, flexibility is also an important feature. File names change and files move. This is a fact of life over the course of a long-term project, and a good RCS accommodates this need.

Large projects also need to coordinate and merge the work of multiple developers. If the number of developers becomes any larger than a handful, it’s nearly impossible to manage the integration of work without support from a good RCS. This is especially important for open source projects, where there is little or no oversight of who is working on what portion of the code at any given point in time.

Additionally, most revision control systems provide some mechanism for branching a project into multiple versions that can then be merged back together at a later point in time. For example, upon release, an entire source code repository might be branched to allow maintenance and new development to proceed simultaneously. Similarly, most revision control systems provide for a way to tag a project, which is essentially taking a snapshot of it at a point in time. For example, all of the source code files might be tagged “Version 1.1.0″ to denote the state of all the files at the time that version of the software was released. Tags can be used to recreate older builds.

New code, revisions, branches, tags, and logs of activity build up a lot of data, representing a great deal of time and effort. Therefore, it’s vital that a revision control system protect the data it stores. Even if a backup is made every night, a lot of time and hard work can be lost or wasted if corruption causes all or part of the repository to be trashed.

What’s Wrong with CVS

CVS is the de facto standard for revision control of open source projects. It’s fairly easy to learn, the software is bundled in just about every Linux distribution, it’s been ported to other platforms, and many integrated development environments and code editors — from Eclipse to BBEdit to IntelliJ IDEA — include CVS support. To paraphrase an old saying, no one’s ever been fired (from an open source project) for choosing CVS.

It’s common use, however, doesn’t make it a perfect system. In fact, it suffers from any number of problems and annoyances. Although it supports all of the requirements listed above (to one degree or another), it supports none of them very well.

To start with, CVS makes it very difficult to move or rename files. To move or rename a file, you must first copy the file, remove the original from the repository with cvs delete, and then use cvs add to place the copy back into the repository. This is unwieldy, but worse, it disrupts the continuity of the file’s history. To get the history of the file pre-move, you must query the file under its original name. Don’t know the old name of the file because you just joined the project? Your only option is to roll back the entire directory to previous versions until you reach a point before the file was renamed.

Directories are even more arduous to deal with. Renaming a directory requires you to either go into the repository hierarchy itself and change the name there (dangerous, and doesn’t work well) or to create a new directory and move all of the files there, thus disrupting the history on each of those files. To make matters worse, you can’t remove the old directory, as CVS doesn’t allow a directory to be deleted. The best you can do is to tell CVS to make all empty directories invisible in your working directory.

Finally, committing to a CVS repository is not atomic. A network glitch or software crash on either the CVS server or on your client machine can leave a partially finished commit. At best, your changes will be partially committed, but typically, the repository is left unreliable and must be fully restored from backup.

What Makes Subversion Unique

To replace CVS, the Subversion developers created a system that’s very similar to CVS. In fact, in many cases, the two systems are identical, making adoption of the newer Subversion easy. But the similarities are only skin deep — familiar command names, similar options — because the underlying structure and operation of Subversion address many of the shortcomings of CVS.

Although it may seem like a simple change, one of the most important and fundamental differences between Subversion and CVS is the way file copies are handled. All Subversion file copies are cheap, constant time operations. When a copy of a file is made, Subversion doesn’t actually copy the file. Instead, it simply puts an entry in the repository database that points back to the original file. Future changes to the copied file are then applied against the new entry.

Subversion makes good use of these cheap copies. It has no special concept of branches or tags. Instead, to create a branch or tag, you simply make a copy of a file and put it in a directory that you have predefined to contain either branches or tags. This makes it easy to tailor your process for dealing with branches and tags to best suit your project.

Subversion also stores its files differently from CVS. Instead of storing the versioned files individually inside the repository, Subversion uses a single database to store the entire repository. This database storage system allows Subversion to store metadata about files in the form of arbitrary, user-defined keyword /value pairs.

What Makes Subversion Better Than CVS

Subversion does many things in ways that are clearly (or at least arguably) better than CVS.

Some Subversion features, such as atomic commits, are transparent. If a commit is interrupted, the repository remains unchanged and unaffected. This makes the repository much more resilient to faults, and thus less likely to suffer from corruption.

Other features, such as conflict resolution, require a slightly different way of approaching the system. If Subversion uncovers a conflict it can’t properly merge (say, two conflicting changes on the same line of code), it creates a file in the working copy of the repository that summarizes the conflicts. CVS does something similar, but Subversion goes further, providing you with the two individual versions of the file, as well as a copy of the file as it looked when you last updated the working copy.

And still other features surpass CVS. Unlike CVS, Subversion can rename and delete both files and directories. Indeed, Subversion allows an unlimited number of moves, renames, and deletions. Deleted files are retained in old revisions, and renamed and relocated files retain their correct name and location in a given revision. Also unlike CVS, remote access in Subversion uses an extended version of the WebDAV protocol. WebDAV allows filesystems to be accessed and manipulated over HTTP.

Using Subversion

Subversion has both a server and client component. The server hosts the repository, while the client (or clients) manipulate it.

Installing and running the Subversion server is a bit complicated, largely because the server requires a number of other packages, including Berkeley DB and the very latest version of Apache 2 (at press time 2.0.48 is required by Subversion.) Read the system requirements and install Subversion’s prerequisites before installing Subversion. Installing the client is eminently simple.

Subversion binaries are available for many platforms from http://subversion.tigris.org/project_packages.html. Or, if you decide to compile Subversion from source, the distribution includes excellent instructions in the file named INSTALL.

Once Subversion is installed on your server, you’re ready to create a repository. Creating a repository is easy. If you want to store the repository in /var/svnrepos, run the command:

# svnadmin create /var/svnrepos

Then make a couple of quick configuration changes to Apache to point it to your repository by adding the following lines to httpd.conf:

LoadModule dav_svn_module
<Location /myrepos>
DAV svn
SVNPath /var/svnrepos

Finally, restart Apache. To access your repository remotely, use the URL http://myserver.org/myrepos, where myserver.org is the name of your server. (myrepos is the same name used in the Location tag in httpd.conf.)

If you don’t want remote access to your Subversion repository, you can skip the Apache configuration steps shown above and simply use the URL file:///var/svnrepos to access a local repository. This is useful if, for instance, you want to put your home directory under revision control, but don’t want it available from anywhere but your machine. In this situation, there’s no need to run svnadmin create as root.

If you don’t run Apache 2, Subversion also provides a dedicated server called svnserve. svnserve isn’t the preferred method to serve a repository — it requires its own port, uses a Subversion specific protocol, and loses a lot of the advantages of using WebDAV — but it’s a viable alternative.

Let’s create a working copy of the the empty repository on the local (client) machine and add a few directories to help with organization. To tell Subversion which repository to check out, simply provide the URL of the repository.

1:$ cd $HOME
2:$ svn checkout http://myserver.org/myrepos
Checked out revision 0.
3: $ cd myrepos
4: $ mkdir trunk branches tags
5: $ svn add trunk branches tags
A branches
A tags
A trunk
6:$ svn commit -message “Created
repository structure.”

Adding branches
Adding tags
Adding trunk
Committed revision 1.

The first command places you in your home directory, $HOME. The second command checks out the repository to create a local, working repository in $HOME/myrepos/.

The newly created repository begins at revision 0, as you can see in the output. Every time someone commits a change to the repository, this number is globally incremented by one across the entire repository.

Command three changes directory to the working repository, and command four creates three new directories. Subversion has no native concept of branches and tags, but instead just allows for fast, cheap copies. So, to use branches and tags, the Subversion manual suggests that you create three directories: trunk, branches, and tags.

* The myrepos/trunk directory should store the main branch of each versioned project.

* When you wish to create a branch of the project, just copy it into the branches directory with svn cp trunk/my projectbranches/myproject_branch_1. Changes made in the branched directory can then be merged back into the main tree, as explained in the sidebar “Merging in Subversion.”

Merging in Subversion

After you’ve been working with a branch for a while, you will reach a point where you either want to merge changes that have occurred on the trunk into your branch, or merge changes on your branch back into the trunk. Conceptually, the process for doing this in Subversion is very similar to the CVS process. The specifics are a bit different, though.

To perform a merge, first go to the directory where you want to perform the merge. For example, if you want to merge from the directory myrepos/trunk/myprog into the branch myrepos/branches/myprog-branch, change to the latter directory with cd myrepos/branches/myprog-branch and run the svn merge command:

$ svn merge -r 15:HEAD http://myserver.org/
U Makefile
C myprog.c

The merge command shown above merges the differences between revision 15 and the current revision in the trunk version of myprog with the branch version in your current working directory. The reason for merging the difference between two revisions is to collect all changes that have occurred during that span of time. In this case, revision 15 was the revision of the repository at the time you created the branch.

Again, a U indicates a successful merge, a C indicates a conflict, and the process to reconcile a merge is exactly the same as reconciling conflicts after an svn update. Once the conflict is resolved, you can commit the branch with svn commit.

The Subversion manual suggests that you include a comment in the commit message that tells what revisions you merged. That way, when you merge the next time, you can perform the merge starting at the next higher revision number.

For example, if the HEAD revision above were revision 23, then you would merge with svn merge -r 24:HEAD next time.

* Similarly, you can tag a particular state of the repository by copying files into the tags directory.

Once the three directories have been created, you commit them to the repository in commands five and six. Command five, svn add, places all three directories under revision control. Then, command six, svn commit, sends the changes to the repository. The supplied message is stored in the commit log, and can be viewed later using svn log.

Once you’ve set up your repository, adding new projects and files is easy. All you have to do is copy the files you want to add in to a checked out copy of your repository (or create them there), and run svn add. Since Subversion makes no distinction between directories and files, you can use the same process for an individual file or an entire directory structure.

For example, suppose you have project called myprog that contains a makefile and a couple of source files. You can add it to your newly created repository as follows:

$ cd $HOME/myrepos/trunk
$ cp -r $HOME/myprog .
$ svn add myprog
A myprog
A myprog/Makefile
A myprog/myprog.c
A myprog/myprog.h
$ svn commit -message “Added myprog
project to the repository.”

Adding myprog
Adding myprog/Makefile
Adding myprog/myprog.c
Adding myprog/myprog.h
Committed revision 2.

After you’ve committed the project to your repository, you can remove the original files in your home directory.

Once you’ve created a repository, you can check out as many different working copies as you like, in as many different places as you like. You can also just check out portions of the repository.

For instance, you could check out just the myprog project from your repository by specifying the path to that directory in the URL when you run the checkout command, like so:

You can’t, however, check out a single file from the Subversion repository. You must check out a directory.

Committing Changes with Subversion

Revision control systems are intended to manage change, so let’s see how to do that with Subversion. After making a few changes to the code in your main source file, myprog.c, you decide it’s time to commit them to the repository.

If you were using CVS, you’d do a cvs update to merge any changes made by other users into your local copy, before committing. This is a dangerous proposition at best, though. A large number of changes in the repository that coincided with your changes could easily render the source file you’ve been working on for the last six hours so full of diff entries that you might waste hours sorting things out. With Subversion, a number of mechanisms can help protect you against “diff clobbering.”

The first is a useful status command, with an easily read output.

$ svn status -u
M * 15 Makefile
M 15 myprog.c
* 15 myprog.h
Head revision: 16

When you run svn status -u (the -u option tells Subversion to look at changes on the server), Subversion deduces the current state of the files in your local, working repository.

An M in the first column indicates that you’ve modified a file. In the output above, you’ve modified both Makefile and myprog.c.

An asterisk (*) in the second column shows that another user has modified the file and committed those changes.

The third column indicates which revision you have checked out. In the example, you have revision 15 of all three files. The fourth column is the name of the file.

As you can see, you can easily commit myprog.c (your copy is the latest revision), but your Makefile may have a potential conflict with another change that’s already been made and checked in.

At this point, perform the update and see what happens.

$ svn update
U myprog.h
C Makefile
Updated to revision 16.

The U indicates success. The C before Makefile indicates a conflict. Now, if you list the files in the directory, you should see that Subversion has done more than just fill the original file with diff lines…

$ ls
Makefile Makefile.r15 myprog.c
Makefile.mine Makefile.r16 myprog.h

There are now four different versions of Makefile. The first one, with no extension, is filled with diffs, just as CVS provides. The other three, however, are pristine copies of different versions of the file: Makefile.mine is the working copy before the update; Makefile.r15 is the previous update of the working copy; and Makefile.r16 is a copy of the (newest) file in the repository. These different files allow you to easily pick out the bits of code that you need from each file to merge things properly by hand.

When you’ve modified the file and resolved the conflict, run svn resolved Makefile. Until you do this, Subversion will not allow the file to be committed.

Migrating an existing CVS repository

The Subversion developers recognized that today’s CVS users will not want to lose all of their branches and history information if they migrate to Subversion. To make the migration process as easy as possible, Subversion comes packaged with a Python script, cvs2svn.py, that can convert a CVS repository into a Subversion repository.

If you run the commands…

$ svnadmin create myrepos
$ cvs2svn.py -s myrepos mycvsrepos

the script populates myrepos with the contents of mycvsrepos. The script will even go so far as to create trunk, branches, and tags directories at the root of the repository and populate those according to the tagging and branching data found in the original CVS repository.

Unfortunately, this conversion script is not yet foolproof, and not all CVS repositories can be converted. If you have a lot of complex branching, you may be out of luck.

Some Hints and Tips

Unlike CVS, Subversion does not maintain a unique revision number for each file. Instead, Subversion maintains a global revision number for the entire repository. Every time a commit is made to any part of the repository, the revision number of every file and directory is increased by one.

The global version number is both a blessing and a curse. It makes it difficult to track an individual file, but it also eliminates the confusion that can result from each file in a project having a different revision number.

CVS provides a cvs status command to see what files have been modified locally or changed on the server since the last update. However, the output from cvs status is very difficult to read, and so many CVS users have developed the habit of just using cvs update to find out what changes have been made. Quick and easy, yes, but also problematic: a cvs update applies your changes, but also applies changes made in the repository. That’s not always what you want.

Subversion improves upon the status command, by giving it a clear, concise output format, which makes it much more useful. A smart Subversion user makes a habit of running svn status before every update or commit to see exactly what is going to happen before actually applying any changes.

What’s Next for Subversion

Subversion 1.0 is out and is a worthy replacement for the aging and unwieldy CVS. A number of open source projects have already adopted it, and the Subversion project itself has been self-hosting for over two years.

Subversion has a very active development community and future releases will include features such as better merging support, symbolic links, permanent removal of files, and internationalization. Nascent projects are developing add-ons for Eclipse as well. See the sidebar “Front-ends and Tools” for more information.

Front-ends and Tools

CVS is widely used largely because of the number of graphical clients and development environments that integrate with the revision control system. Subversion does not yet have anywhere near the quantity or quality of add-on tools, but there are several promising projects focused on creating Subversion tools.

* KDEVELOP. The KDevelop integrated development environment has added support for Subversion. KDevelop also supports CVS. Visit http://kdevelop.org for more information.

* RAPIDSVN. RapidSVN is a cross-platform GUI client for Subversion. It appears to be feature-rich, but is still very much in beta. It has very strict dependency requirements, including (at the time of this writing) a specific version of Subversion that’s not the most current. See the web page at http://rapidsvn.tigris.org for updates.

* ECLIPSE PLUGINS. Eclipse has become a very popular IDE for Java development, and its plug-in architecture makes it easy to write integration tools. Currently, there are at least two different plug-ins in the works: Subclipse and Svn4Eclipse. Subclipse is a robust, feature-rich Subversion plugin for Eclipse (available from http://subclipse.tigris.org). Svn4Eclipse, still in the design stage, is a Java-based Subversion plug-in.

Subversion also provides a very complete set of command line tools with output that’s easy to understand and parse, which makes integrating Subversion into an automated build system much easier.

Is Subversion ready for your repository? The answer is: probably. It’s feature-rich as compared to CVS, and yet is simple to use. Although the cvs2svn.py script does have some problems converting some complex CVS repositories, if yours can be converted, your suffering with CVS headaches may be over.

William Nagel is the chief software engineer for Stage Logic, a small development company that primarily uses Linux. He can be reached at bill@stagelogic.com.

Comments are closed.