Git With It!

Just weeks ago, the kernel development team received a clear edict from its benevolent dictator Linus Torvalds: Stop using BitKeeper, start using (Torvalds's own) Git. Some foretold of calamity, but what impact has Git really had on kernel development? Here's an assessment from Torvalds and others.

On April 6, 2005 at 10:41:57 Eastern Standard Time, the news finally broke. In a year that had already seen the retreat and toppling of many political dictatorships around the world, the benevolent dictatorship of Linux strongman Linus Torvalds itself announced a major pullback.

“As a number of people are already aware (and in some cases have been aware over the last several weeks), we have been trying to work out a conflict over[ BitKeeper] usage over the last month or two,” wrote Torvalds under the headline “Kernel SCM Saga…” “That hasn’t been working out and as a result, the kernel team is looking at alternatives.”

BitKeeper, of course, is the proprietary software configuration manager used by Torvalds to handle large-scale directory merges since early 2002. Adopted in the wake of numerous complaints over the sloppy handling of incoming source code trees during the development of the 2.4 kernel, the decision to use BitKeeper set up Torvalds for even harsher criticism. Many developers decried the positioning of a proprietary software tool at the center of a free software project, an observation Torvalds routinely rebuffed by noting the inadequacy of free software competitors when it came to serving the distributed kernel team’s needs.

While still unrepentant, Torvalds acknowledged that the fate most of its detractors had predicted — a licensing dispute — had indeed come to pass. Spooked by Samba developer Andrew Tridgell’s demonstrated ability to navigate BitKeeper protocols without access to a licensed BitKeeper client, BitMover, the South San Francisco company behind BitKeeper, pulled the plug on its free Linux version. In many ways it was a fulfilled promise: since drafting the awkward free as in beer license for Linux developers, BitMover founder Larry McVoy had repeatedly stated that the license would not be a means to help competitors develop rival tools.

The BitKeeper Soap Opera Timeline

1997. Larry McVoy begins development of BitKeeper, a peer-to-peer software configuration management system.

1999. In a Linux Weekly News interview, kernel developer Alan Cox sums up the 1999 state of source code management. “Linus is still applying all the patches[ manually] but there are people now collating and feeding Linus tested sets of patches in small cleanly organized groups. Larry McVoy’s new version control toys may solve some of the remaining problems.”

January 28, 2002. Kernel developer Rob Langley, noting the sizable number of overlooked submissions in the 2.4 kernel, calls for the appointment of an “integration lieutenant” to stave off developer burnout “We need to create the office of a ‘patch penguin,’ whose job it would be to make Linus’s life easier.”

February 5, 2002. Linux informs the kernel mailing list that he has begun using BitKeeper to integrate changes in the 2.5 kernel. “The long-range plan, and the real payoff, comes if main developers start using BK too, which should make syncing a lot easier,” Torvalds writes. “That will take some time, I suspect.”

April 19, 2002. Daniel Phillips submits a patch that would remove promotional BitKeeper documentation from the Linux kernel. In the later discussion, Phillips says many kernel developers are “silently seething” over Torvalds’ decision to use the proprietary BitKeeper system. “I would suggest that if you are silently seething about the fact that a commercial product can do something better than a free one, how about doing something about it?” Torvalds responds.

May 22, 2002. GNU Project founder Richard Stallman steps into the BitKeeper debate with the essay “Linux, GNU and Freedom.” He describes BitKeeper as introducing “cognitive dissonance” into the free software community.

July 19, 2003. After Larry McVoy says his company is willing to take a page from the America Online and Microsoft playbook, warding off open source BitKeeper clients simply by changing BitKeeper’s internal protocols, Richard Stallman visits the Linux kernel mailing list and suggests that someone write “a free client that talks with BitKeeper and for Linux developers to start switching to that from BitKeeper” as a way to call McVoy’s bluff.

February 23, 2005. Torvalds informs McVoy that fellow OSDL employee Andrew Tridgell is working on a tool that can extract BitKeeper metadata for use in non-BitKeeper management systems. McVoy and Torvalds both lobby Tridgell to suspend development, but fail.

March 17, 2005. BitMover announces the immediate release of an open source BitKeeper client.

April 5, 2005. BitMover withdraws its free version of BitKeeper. Users can continue to use the open source client but to use BitKeeper itself, they must agree to the terms of the non-free commercial license.

April 11, 2005 In a tersely worded comment to NewsForge, Tridgell confirms that, starting in late February, he wrote a tool that was “interoperable” with BitKeeper. “The aim was to provide export to other source code management tools and provide a useful tool to the community,” Tridgell says. “I did not use BitKeeper at all in writing this tool and thus was never subject to the BitKeeper license. I developed the tool in a completely ethical and legal manner.”

April 13, 2005. On the web site Real World Technologies, Torvalds blasts Tridgell. “He didn’t write a ‘better SCM than BK.’ He just tore down something new (and impressive) because he could, and rather than helping others, he screwed people over.” Tridgell doesn’t respond.

April 15, 2005. Former Debian team leader Bruce Perens, speaking for the legally constrained Tridgell, advises Torvalds to “cool it” in an interview with The Register’ s Andrew Orlowski. “There are times when Linus Torvalds can be a real idiot and this is one of those times,” Perens says.

April 20, 2005. Torvalds announces Git, a successor to BitKeeper.

April 21, 2005. In a keynote address at LinuxConf.Au, Tridgell shows audience how to access BitKeeper data via bash. Tridgell also posts SourcePuller, an “interoperability tool” for those trying to access BitKeeper metadata via an open source software configuration manager on SourceForge.

“In some sense this was inevitable, but I sure had hoped that it would have happened only once there was a reasonable open source alternative,” Torvalds wrote. “As it is, we’ll have to scramble for a while.”

As scrambles go, the post-BitKeeper Linux era has been a surprisingly smooth one. Within hours of Torvalds’ posting, developers were bandying about ideas on a from-the-ground-up, free software replacement. Within days, Torvalds himself had put forward makeshift scripts and ideas for other developers to flesh out. Dubbing the new system Git, a jokingly self-deprecating reference to his own stubbornness, Torvalds described Git as “a stupid (but extremely fast) directory content manager” in the tool’s initial README file.

Stupid is as stupid does, of course. In belittling his newest creation, Torvalds seemed to tapping the same mixture of hacker hubris and aid-beseeching humility that helped propel Linux itself from a shaky, reverse-engineered version of Minix to a full-fledged alternative to Unix. As fellow kernel developers poured in their own thoughts and innovation, the technical and legal barriers that seemed to restrict development of a Linux-specific management tool seemed to fade away. Like a punk rock side project after a 100-city stadium tour, Git has delivered a primal jolt of energy, not to mention a therapeutic venting of frustrations, for a band struggling to retain its original chemistry.

“When the BitKeeper fiasco broke, it turned what had previously been a political problem into a technical problem,” notes kernel developer H. Peter Anvin. “I think to some degree we’re a lot better at solving technical problems than solving political problems.”

Inside a Whirlwind

While far from mature, Git’s speedy emergence bears studying, even for those who never find themselves in the daunting position of submitting source code changes to Torvalds or any of the other kernel insiders. For one thing, the move to Git offers a rare glimpse at what the whirlwind Linux development process looks like: a place where calm and chaos intermix.

Most software configuration management systems (or SCMs in developer parlance) are built upon the old Unix diff utility, in which an altered file is distinguished from its previous incarnation by its differences. With diff- derived tools, viewing changes at the discrete file level is helpful. However, when you’re in charge of maintaining an entire repository, it’s a bit like naming each and every leaf on a tree.

“Basically, as a project leader, individual files are mostly meaningless,” writes Torvalds in an email summarizing Git’s basic design philosophy. “As a project lead, the only thing that really matters is a fairly sizable collection of files. I tend to worry about things in the form of ‘the IDE layer’ or ‘memory management’ or ‘network drivers.’”

Where BitKeeper beat out other, more traditional management tools such as CVS or Subversion, Torvalds says, was in the system’s ability to process changes at the wholesale level.

Most SCMs revolve around the core team concept: a few motivated developers enjoy “commit” level access to a central source code repository, while those outside the inner circle “push” their changes to the appropriate maintainer. Effective in the early days of Linux development, this model hit snags in the late 1990s as corporations began to flood the kernel development team with suggested improvements and fixes.

“When lots of people push to me, there’s congestion, and then I lose a patch (or two or a hundred), ” says Torvalds, summing up the state of development before incorporating BitKeeper in early 2002. “That means people will re-send, and the congestion just gets worse.”

In contrast, BitKeeper lets developers copy and post entire copies of their evolving directories to a public server (linux.bkbits.net). Once there, modifying the final production kernel becomes less of a “push” process and more of a “pull” process. Torvalds and the other top level maintainers who used BitKeeper would simply import and merge entire directories with their own, using the history feature to inspect individual changes only on an as-needed basis.

In addition to streamlining the inner mechanics of kernel management, BitKeeper’s “pull” approach played well with the trust ethic underlying Linux kernel development and hence Torvalds’s stated willingness to make compromises. To get source code into the final production kernel, a developer had to win trust. In cases where trust had been earned, it had to be maintained — proven by the developer’s ongoing ability to code and to vet others that are also making changes to the lead’s project or repository. If you’re a lead, you vouch for all of the other developers that work with you. To paraphrase the old Sicilian proverb: An open source project maintainer succeeds by keeping his friends close and his co-developers even closer.

That Fortune 500 corporations now base their IT strategies on this ” cosa nostra ” development philosophy is no crazier than notion that entire free market economies rely on human selfishness to drive the flow of goods and services, at least not to Torvalds. In both cases, you’ve got fundamental human forces at work. Why be King Canute, commanding the waves to go back into the sea, when you can be Duke Kahanamoku, riding those waves into the beach?

“Maybe I’m asocial or something, but I think people tend to naturally form pretty small groups of five to ten people,” Torvalds writes. “I’m not actually even looking to expand ‘my’ core group and start trusting more people more superficially, but I really think that the thing that makes kernel development scale to bigger numbers is the network of these groups.”

Share the Pain

It’s here that the Git design philosophy veers sharply from the BitKeeper design philosophy. To distribute the “pain” and avoid spending the next two years focused entirely on Git development, Torvalds insisted quickly on a simple design: Git would not be a SCM in the BitKeeper sense, but a foundation upon which other, more motivated developers could build their own fully-functioning SCMs.

Like a journaling file system, the tool that lets an operating system “roll back” to a previous application state in the event of a crash or outage, Git would merely track the evolving kernel source code and give source tree managers the ability to walk back through the metadata to seek out suspicious changes in the event of a problem. With no attention to file names or other forms of granularity, Git’s primary design traits would be speed and scalability when it comes to the merging of parallel code.” All sophisticated tasks such as merge support tools, name-tracking, and graphical user interfaces, would take advantage of Git’s speed but couldn’t tamper with it.

“From a usability angle, my source-base really has been concentrating entirely on just the plumbing,” Torvalds wrote in the project’s first days. “If you actually want a faucet or a toilet connected to the plumbing, you’re better off with Pasky’s tree.”

“Pasky’s tree,” Czech developer’s Petr Baudis collection of developer-friendly tools, would eventually earn its own name, CoGito. In a May email to the kernel mailing list, Baudis dubbed it an “SCM-ish layer over Torvalds Git-tree history tracker.”

Anvin says such layering is well in keeping with the Unix tradition of distributed, decentralized development. “You always want to develop in layers. The more layers you have the more places you have for people to plug-in and do what they want instead of what you want,” he says. “It also means that once you have an interface boundary that makes sense, you can replace what’s underneath it.”

Such design differs slightly from the integrated BitKeeper design. Although based on the same global principles, BitKeeper’s peer-to-peer architecture makes it tougher to accommodate the “layered” development approach. Indeed, says BitMover’s McVoy, the company has labored mightily to rein in complexity effects through integrated design and that fear of competition was significantly outweighed by the fear that such efforts might quickly go to waste if outsiders were given a chance to tinker with BitKeeper.

“It was a foregone conclusion that sooner or later someone would build an open source tool that would be a replacement,” says McVoy. “The only thing that happened out of plan was Tridge’s efforts, and that just freaked us out enough that we said, ‘We’re gone.’”

A close friend, McVoy gives Torvalds credit for keeping Git development within manageable bounds. At the same time, however, he says the “complexity tax” is one all versioning systems must eventually pay. That Torvalds and company have pulled together a makeshift system so quickly is its own sign of coming challenges.

“If you’re looking for something that solves Linus’s problems, Git’s great,” McVoy says. “But in reality, the work is being pushed down onto the lieutenants like it was before.”

Alan Cox, a longtime kernel “lieutenant” who remembers the “before” period, doesn’t see that happening yet. “If anything,[ the work has] become simpler,” Cox says. “[ Git] doesn’t try to and solve the problems that don’t need to be solved for the kernel project. BitKeeper, for all the hype, really couldn’t handle some of the requirements of the kernel project anyway.”

Not that Git hasn’t seen its own moments of tension. When various developers advocating file name tracking as a central part of Git’s “plumbing,” Torvalds, as if unloading three years of frustration, uncorked a 1500-word diatribe on one suggester.

“I’m right. I’m always right, but sometimes I’m more right than other times,” wrote Torvalds, near the end. “Please stop this ‘track files’ crap. Git tracks exactly what matters, namely ‘collections of files.’ Nothing else is even relevant and even thinking that it is relevant only limits your world view.”

“That’s pretty much Linus,” says Anvin, recalling the post. “He isn’t very much of a consensus builder and to some degree I think that actually is appropriate. Sometimes you need a decision more than the right decision, especially in a group that’s as open ended as the Linux community.”

Anvin cites as a counter example Torvalds’ enthusiasm for a tool which Anvin developed with the help of fellow kernel developer Kay Sievers. Essentially a script written to translate Anvin’s CVS file histories into Git’s file-less format, the tool won prompt endorsement. “Linus started looking at it and started pulling in some other people and wound up creating a generic tool, CVS-to-Git.

Torvalds, an outspoken CVS detractor, says such tools are nevertheless important and directing attention to their development is the best way he knows to smooth the jarring transitions of the last few months. “It’s important to make people comfortable with Git so that it ends up working well and isn’t just an awkward interface necessary to interact with me,” he says.

Lest such comments be perceived as a reversal of principle on the decision to go with BitKeeper three years ago, Torvalds remains adamant that the decision to build a new open source management tool was a decision forced upon him. Colleagues like Cox, a vocal proponent of software freedom, see a larger lesson in the BitKeeper meltdown. “It’s taken Linus a few years to learn why a lot of the kernel developers thought the BK choice was a bad decision; now hopefully he understands,” Cox says.

Torvalds, rather than seeing a lesson, merely sees yet another project adding to the pile. Granted, the project has come a long way in a short time, but Torvalds, like his friend McVoy, expects the challenges to multiply — just as they did for the Linux kernel — as Git matures.

“Larry told me that most of the work ends up being all the boring details, and it’s not exciting or fun,” Torvalds writes. “He’s exactly right.”

Comments are closed.