Documenting Your Open Source Code

Some forethought, a clear statement of intent and practice, and a modicum of documentation can make contributors and benefactors more comfortable with donating and using open source code.
As an attorney and non-hacker, I often marvel at the breadth and depth of code that is available as open source. I also marvel at the process that manages to pull all of that code together into coherent, efficient packages and software stacks. But as an attorney who deals in intellectual property issues, I am less enamored with the lack of standardization around documentation practices for copyright and licensing of such open source code.
Have you every looked at an open source software package and tried to determine with certainty who wrote it, who holds the copyright to it, whether they conveyed adequate rights in their code to the project to which it was contributed, and what license covers that code? (And here I am only talking about the code at the package level– code at the file level often lacks any reference at all to copyright or licensing.) Add to this the fact that a fair number of open source hackers are a prickly bunch insistent on inserting comments into the code that may or may not have any legal effect, and purely from an attorney’s rightfully prickly perspective, the whole thing gets quite puzzling.
Here are some examples of questionable copyrights and licenses. What would you do with the following?
*A comment in a COPYING file for code covered under the GNU General Public License that states, “May not be used for commercial purposes.”
*A series of text files included with executable code that provide the author’s view of the world, sometimes in language that may be construed as offensive.
*A directive that states “This program may be used under the provisions of the GNU[ General or Lesser?] Public License, except that a binary copy of the executable may not be packaged as a part of binary package that’s distributed as part of a Linux distribution.”
*A comment that says, “Probably originally submitted by Bob Jones. Too small to worry about copyright issues, IMO, since it doesn’t do anything substantive.”
Attorneys, like programmers, have a word for such a morass: Ugh.

Who Owns the Cheese?

So how does one wend their way through this maze? Probably the best place to start is at the package level.
1.First, identify whether the package has a COPYRIGHT.txt, LICENSE.txt, or similar file. Read those files and determine whether the license is one that adheres to all derivative works (like the GNU General Public License, the Common Public License, or the Mozilla Public License), or whether it’s a simple copyright permission (like the Berkeley Software Distribution License or the MIT License). Also, try to determine whether there is a centralized project that manages the package.
2.The next step is to visit the project’s web site. If possible, determine what practices the project follows in accepting code contributions. Does it require assignment? Does it require standardized copyright/license practices? Does it require an express grant of license by the author or copyright holders? Knowing the practices followed by the project can be immensely helpful in confirming the proper heritage of the code.
3.Next, scan the files in the package using a tool like grep to search for words like “copyright” and “license.” (It’s probably best to use truncated forms of these words with wild cards). Such a quick search allows you to determine whether there are any hidden or unexpected copyright/license statements buried in the code.
4.Having compiled all of this information, you must then reconcile it. It works best to approach this problem from the top down: start with the information you derived from any project page, then reconcile the copyright/license files from the package, and ultimately determine whether there are any inconsistent or unexpected copyright/license statements contained in individual files. With respect to the latter group, the question then becomes whether the project or package information trumps these anomalies. That is a legal judgment call. Finally, if all else fails, you can attempt to track down the original author or the project maintainer and request clarification or ask to have the statements modified or deleted.

Some Simple Steps to Source Sanity

Much of the agony of this process can be avoided if development project follow a consistent practice of documenting their code.
First, each open source project should have a web page that clearly sets forth what minimum comment lines should be contained in each package, file, or patch. For example, you could specify that each new file should, at a minimum, contain:
*The author (s) of the code;
*A copyright attribution statement for the copyright holder of the code at the time of its authorship;
*A declaration that states whether or not the code has been assigned to an organization, a statement about such assignment, or if the code has been contributed under a contribution agreement, a reference to that contribution agreement; and
*A reference to the license that covers the file. Tthis reference can be the license in its entirety, or, more efficiently, a reference to where the license can be found, either within the package or at a specified URL.
At the package level there needs to be a clearly identified file that contains copyright and license information, called COPYING or LICENSE. Ideally, this file would also identify all of the other files contained within the package and reference to the fact that all such files are subject to the license contained in the master file. Of course, if there are a variety of licenses for the included files, each should be explained.
By spending a bit more time documenting copyright and license information in this manner, developers go a long way to increase other developers’ “legal” comfort with using open source code.

Mark Webbink is Deputy General Counsel of Red Hat, Inc.

Comments are closed.