On April 6, 2005 at 10:41:57 Eastern Standard Time, the news finally broke. In a year that had already seen the retreat and toppling of many political dictatorships around the world, the benevolent dictatorship of Linux strongman Linus Torvalds itself announced a major pullback.
“As a number of people are already aware (and in some cases have been aware over the last several weeks), we have been trying to work out a conflict over[ BitKeeper] usage over the last month or two,” wrote Torvalds under the headline “Kernel SCM Saga…” “That hasn’t been working out and as a result, the kernel team is looking at alternatives.”
BitKeeper, of course, is the proprietary software configuration manager used by Torvalds to handle large-scale directory merges since early 2002. Adopted in the wake of numerous complaints over the sloppy handling of incoming source code trees during the development of the 2.4 kernel, the decision to use BitKeeper set up Torvalds for even harsher criticism. Many developers decried the positioning of a proprietary software tool at the center of a free software project, an observation Torvalds routinely rebuffed by noting the inadequacy of free software competitors when it came to serving the distributed kernel team’s needs.
While still unrepentant, Torvalds acknowledged that the fate most of its detractors had predicted — a licensing dispute — had indeed come to pass. Spooked by Samba developer Andrew Tridgell’s demonstrated ability to navigate BitKeeper protocols without access to a licensed BitKeeper client, BitMover, the South San Francisco company behind BitKeeper, pulled the plug on its free Linux version. In many ways it was a fulfilled promise: since drafting the awkward free as in beer license for Linux developers, BitMover founder Larry McVoy had repeatedly stated that the license would not be a means to help competitors develop rival tools.
“In some sense this was inevitable, but I sure had hoped that it would have happened only once there was a reasonable open source alternative,” Torvalds wrote. “As it is, we’ll have to scramble for a while.”
As scrambles go, the post-BitKeeper Linux era has been a surprisingly smooth one. Within hours of Torvalds’ posting, developers were bandying about ideas on a from-the-ground-up, free software replacement. Within days, Torvalds himself had put forward makeshift scripts and ideas for other developers to flesh out. Dubbing the new system Git, a jokingly self-deprecating reference to his own stubbornness, Torvalds described Git as “a stupid (but extremely fast) directory content manager” in the tool’s initial README file.
Stupid is as stupid does, of course. In belittling his newest creation, Torvalds seemed to tapping the same mixture of hacker hubris and aid-beseeching humility that helped propel Linux itself from a shaky, reverse-engineered version of Minix to a full-fledged alternative to Unix. As fellow kernel developers poured in their own thoughts and innovation, the technical and legal barriers that seemed to restrict development of a Linux-specific management tool seemed to fade away. Like a punk rock side project after a 100-city stadium tour, Git has delivered a primal jolt of energy, not to mention a therapeutic venting of frustrations, for a band struggling to retain its original chemistry.
“When the BitKeeper fiasco broke, it turned what had previously been a political problem into a technical problem,” notes kernel developer H. Peter Anvin. “I think to some degree we’re a lot better at solving technical problems than solving political problems.”
Inside a Whirlwind
While far from mature, Git’s speedy emergence bears studying, even for those who never find themselves in the daunting position of submitting source code changes to Torvalds or any of the other kernel insiders. For one thing, the move to Git offers a rare glimpse at what the whirlwind Linux development process looks like: a place where calm and chaos intermix.
Most software configuration management systems (or SCMs in developer parlance) are built upon the old Unix diff utility, in which an altered file is distinguished from its previous incarnation by its differences. With diff- derived tools, viewing changes at the discrete file level is helpful. However, when you’re in charge of maintaining an entire repository, it’s a bit like naming each and every leaf on a tree.
“Basically, as a project leader, individual files are mostly meaningless,” writes Torvalds in an email summarizing Git’s basic design philosophy. “As a project lead, the only thing that really matters is a fairly sizable collection of files. I tend to worry about things in the form of ‘the IDE layer’ or ‘memory management’ or ‘network drivers.’”
Where BitKeeper beat out other, more traditional management tools such as CVS or Subversion, Torvalds says, was in the system’s ability to process changes at the wholesale level.
Most SCMs revolve around the core team concept: a few motivated developers enjoy “commit” level access to a central source code repository, while those outside the inner circle “push” their changes to the appropriate maintainer. Effective in the early days of Linux development, this model hit snags in the late 1990s as corporations began to flood the kernel development team with suggested improvements and fixes.
“When lots of people push to me, there’s congestion, and then I lose a patch (or two or a hundred), ” says Torvalds, summing up the state of development before incorporating BitKeeper in early 2002. “That means people will re-send, and the congestion just gets worse.”
In contrast, BitKeeper lets developers copy and post entire copies of their evolving directories to a public server (linux.bkbits.net). Once there, modifying the final production kernel becomes less of a “push” process and more of a “pull” process. Torvalds and the other top level maintainers who used BitKeeper would simply import and merge entire directories with their own, using the history feature to inspect individual changes only on an as-needed basis.
In addition to streamlining the inner mechanics of kernel management, BitKeeper’s “pull” approach played well with the trust ethic underlying Linux kernel development and hence Torvalds’s stated willingness to make compromises. To get source code into the final production kernel, a developer had to win trust. In cases where trust had been earned, it had to be maintained — proven by the developer’s ongoing ability to code and to vet others that are also making changes to the lead’s project or repository. If you’re a lead, you vouch for all of the other developers that work with you. To paraphrase the old Sicilian proverb: An open source project maintainer succeeds by keeping his friends close and his co-developers even closer.
That Fortune 500 corporations now base their IT strategies on this “ cosa nostra “ development philosophy is no crazier than notion that entire free market economies rely on human selfishness to drive the flow of goods and services, at least not to Torvalds. In both cases, you’ve got fundamental human forces at work. Why be King Canute, commanding the waves to go back into the sea, when you can be Duke Kahanamoku, riding those waves into the beach?
“Maybe I’m asocial or something, but I think people tend to naturally form pretty small groups of five to ten people,” Torvalds writes. “I’m not actually even looking to expand ‘my’ core group and start trusting more people more superficially, but I really think that the thing that makes kernel development scale to bigger numbers is the network of these groups.”
Share the Pain
It’s here that the Git design philosophy veers sharply from the BitKeeper design philosophy. To distribute the “pain” and avoid spending the next two years focused entirely on Git development, Torvalds insisted quickly on a simple design: Git would not be a SCM in the BitKeeper sense, but a foundation upon which other, more motivated developers could build their own fully-functioning SCMs.
Like a journaling file system, the tool that lets an operating system “roll back” to a previous application state in the event of a crash or outage, Git would merely track the evolving kernel source code and give source tree managers the ability to walk back through the metadata to seek out suspicious changes in the event of a problem. With no attention to file names or other forms of granularity, Git’s primary design traits would be speed and scalability when it comes to the merging of parallel code.” All sophisticated tasks such as merge support tools, name-tracking, and graphical user interfaces, would take advantage of Git’s speed but couldn’t tamper with it.
“From a usability angle, my source-base really has been concentrating entirely on just the plumbing,” Torvalds wrote in the project’s first days. “If you actually want a faucet or a toilet connected to the plumbing, you’re better off with Pasky’s tree.”
“Pasky’s tree,” Czech developer’s Petr Baudis collection of developer-friendly tools, would eventually earn its own name, CoGito. In a May email to the kernel mailing list, Baudis dubbed it an “SCM-ish layer over Torvalds Git-tree history tracker.”
Anvin says such layering is well in keeping with the Unix tradition of distributed, decentralized development. “You always want to develop in layers. The more layers you have the more places you have for people to plug-in and do what they want instead of what you want,” he says. “It also means that once you have an interface boundary that makes sense, you can replace what’s underneath it.”
Such design differs slightly from the integrated BitKeeper design. Although based on the same global principles, BitKeeper’s peer-to-peer architecture makes it tougher to accommodate the “layered” development approach. Indeed, says BitMover’s McVoy, the company has labored mightily to rein in complexity effects through integrated design and that fear of competition was significantly outweighed by the fear that such efforts might quickly go to waste if outsiders were given a chance to tinker with BitKeeper.
“It was a foregone conclusion that sooner or later someone would build an open source tool that would be a replacement,” says McVoy. “The only thing that happened out of plan was Tridge’s efforts, and that just freaked us out enough that we said, ‘We’re gone.’”
A close friend, McVoy gives Torvalds credit for keeping Git development within manageable bounds. At the same time, however, he says the “complexity tax” is one all versioning systems must eventually pay. That Torvalds and company have pulled together a makeshift system so quickly is its own sign of coming challenges.
“If you’re looking for something that solves Linus’s problems, Git’s great,” McVoy says. “But in reality, the work is being pushed down onto the lieutenants like it was before.”
Alan Cox, a longtime kernel “lieutenant” who remembers the “before” period, doesn’t see that happening yet. “If anything,[ the work has] become simpler,” Cox says. “[ Git] doesn’t try to and solve the problems that don’t need to be solved for the kernel project. BitKeeper, for all the hype, really couldn’t handle some of the requirements of the kernel project anyway.”
Not that Git hasn’t seen its own moments of tension. When various developers advocating file name tracking as a central part of Git’s “plumbing,” Torvalds, as if unloading three years of frustration, uncorked a 1500-word diatribe on one suggester.
“I’m right. I’m always right, but sometimes I’m more right than other times,” wrote Torvalds, near the end. “Please stop this ‘track files’ crap. Git tracks exactly what matters, namely ‘collections of files.’ Nothing else is even relevant and even thinking that it is relevant only limits your world view.”
“That’s pretty much Linus,” says Anvin, recalling the post. “He isn’t very much of a consensus builder and to some degree I think that actually is appropriate. Sometimes you need a decision more than the right decision, especially in a group that’s as open ended as the Linux community.”
Anvin cites as a counter example Torvalds’ enthusiasm for a tool which Anvin developed with the help of fellow kernel developer Kay Sievers. Essentially a script written to translate Anvin’s CVS file histories into Git’s file-less format, the tool won prompt endorsement. “Linus started looking at it and started pulling in some other people and wound up creating a generic tool, CVS-to-Git. ”
Torvalds, an outspoken CVS detractor, says such tools are nevertheless important and directing attention to their development is the best way he knows to smooth the jarring transitions of the last few months. “It’s important to make people comfortable with Git so that it ends up working well and isn’t just an awkward interface necessary to interact with me,” he says.
Lest such comments be perceived as a reversal of principle on the decision to go with BitKeeper three years ago, Torvalds remains adamant that the decision to build a new open source management tool was a decision forced upon him. Colleagues like Cox, a vocal proponent of software freedom, see a larger lesson in the BitKeeper meltdown. “It’s taken Linus a few years to learn why a lot of the kernel developers thought the BK choice was a bad decision; now hopefully he understands,” Cox says.
Torvalds, rather than seeing a lesson, merely sees yet another project adding to the pile. Granted, the project has come a long way in a short time, but Torvalds, like his friend McVoy, expects the challenges to multiply — just as they did for the Linux kernel — as Git matures.
“Larry told me that most of the work ends up being all the boring details, and it’s not exciting or fun,” Torvalds writes. “He’s exactly right.”
Sam Williams covers business and software technology for a number of publications and is is a regular contributor to Linux Magazine. He is the author of two books, “Arguing A.I.” and “Free as in Freedom: Richard Stallman’s Free Software Crusade.”
No comments yet.