dcsimg

Linus on Linux: The Linus Torvalds Interview Part 2

In part 2 of our interview, Linus talks about the process of managing kernel developer commits, selecting a revision control system and how he personally uses git.

One of the git commands that’s not on your list is git bisect. A user who finds a bug in one kernel version but not the previous one can build a series of intermediate versions to find the one change that’s responsible. Do you see many users going through the bisect process, or are bisectors mostly people who are already kernel developers?

We’re not seeing a lot of people use “git bisect,” but when people do, it’s a very high-value debug tool. And while I suspect that people who are already using git for development are more likely to then use “git bisect” to find where problems occur, it does get used a fair amount by non-kernel developers too. (of course, they may use git for other projects, and I simply wouldn’t know).

But bisection definitely was meant to be simple enough for a non-developer to use—the whole point, in fact, is that the only thing you need to have is a reliable symptom, and git will do all the rest of the work. So the process is designed to not really need any knowledge at all about the development details, and anybody can use it.

Of course, the downside is that especially during the merge window, when we’ve merged thousands of commits, the bisection process easily ends up involving ten to fifteen kernel compiles and reboots. It’s all fairly mindless, but it’s still a fair amount of effort.

Kernel developers are also giving “Reported-by” credits to people who find bugs, several of the distributions are packaging upstream kernels to make it easier for regular users to test sooner, and there’s even a kerneloops.org. How are the users doing at reporting problems? Are you getting enough information, or does the world need more testing motivation or tools?

Some users are great problem reporters, others not so much. I’d love to have more motivation for people to test and report things, but it really has gotten so much better that I’m not really willing to complain. Could it be better still, and do we need even more testing? Oh, absolutely. But I also think that we’re doing reasonably well.

One thing to note is that the way we handle the reports has been a big improvement. Big kudos to Arjan (and others) for setting up the automated kerneloops site and oops scraping. That, together with timely updates by distributions make us much more aware of new issues that pop up. But what I personally find even more important is the regression tracking efforts first by Adrian Bunk and now for the last several releases by Rafael Wysocki.

The regression tracking not only helps developers, but it’s one of the things that does add a lot of motivation on both sides. Problem reports still fall through the cracks, but they do so less now – especially if the reporter follows up on things. And that obviously motivates testers. But the way the regression tracking is structured (with aging of old and unconfirmed reports) also makes developers much more motivated to look at them, when they aren’t just random collections of dubious reports.

My point here being that “process” is good to have (when it doesn’t become a straightjacket that just kills everybody’s motivation entirely), and that the people who do those kinds of process things are often really important. They make the work of others (whether bug reporters or developers) more relevant or more focused.

Final note: the reason we put in “reported-by” tags in commits is not just to give credit where credit is due, and to motivate people to report problems and show that they are first-class members of the community. It’s also very useful when problems crop up. If a particular solution causes problems for other people, we most definitely want to then involve the person who reported the original problem if we have to modify the fix, just to test that the original problem doesn’t re-appear.

That is, in the end, the biggest advantage of a lot of the tags in fact. We started doing them due to the whole “track where things came from” issue over the bogus copyright claims by SCO, but realistically, the biggest practical advantage is simply that when problems occur, we want to report the problems back through the whole chain, and it’s not just an issue of some simplistic “author” or “committer”.

Just looking at “git log” it looks like you’re actively committing kernel changes almost every day (and I hope your family is getting some of the no-git days). Do you ever get a chance to put your feet up and think about the overall design of the project?

Oh, I spend just about all my time trying to look at the “big picture” and talking to people about maintenance issues (on a small scale, that means cleaning up their patches to be more maintainable, but on a large scale it’s about things like keeping the git history clean so that you can see what is going on, exactly so that I, and others, can get the big picture more easily).

Yes, I commit most every day, but my commit statistics are very skewed: 95+% of what I do is merges, and just a couple of percent is actual “code” commits, and quite frankly, even those pitiful few ones tend to be about pretty trivial stuff like reverting somebody elses code that caused problems.

Oh, and my commit statistics are pretty lumpy. I may average 18+ commits a day, but that doesn’t mean “just over one commit every waking hour”. No, I go on “merge binges”, when I apply big series of patches (especially from people like Andrew) or when I do five or six “git pull” requests within minutes of each other. Other days I just read email.

How do you get these statistics?

The way I did those statistics was to compare:

git rev-list --committer="Linus Torvalds" \
        --since=6.months.ago HEAD | wc

which shows how many commits I’ve done in the last six months. Just now, that number happened to be 3354 – so I average something like 18+ commits every day, day in and day out.

Ok, that’s a fairly big number. BUT..

But then you can look closer, and look at how many of those commits was for something that I count as an author (change the “–committer” to “–author” in the above git command), and that number falls to less than a third: 1042 – two thirds of my commits are commits of other peoples code.

“But that’s still almost six commits per day that you author!”

But.. Of the just over a thousand commits that are really mine, 90%+ are merges. Add a “–no-merges” to that git command line, and you end up with just 88 commits that I authored in the last six months. And most of those were really very trivial.

And how many commits did we have in total during those six months?

Right now that command line (without the “–committer” or the “–author” or “–no-merges” limiters) is 27,143.

So out of the almost thirty thousand commits, I was directly involved with a bit more than 10% (that is, almost 90% came in through git merges – I merged them, but I never needed to look at the individual commits), and I personally wrote a vanishingly small fraction.

In other words, I do basically no code of my own any more, and effectively all of what I do is merge patches that get emailed to me, or do git-to-git merges.

Of course, that is exactly how it should be, so I’d argue that those numbers are all good, but I’m trying to explain that all my time gets spent on other things than worrying about the code itself – I worry about development model details, about regressions, and about keeping the code/flow maintainable so that I can continue to work well.

Comments on "Linus on Linux: The Linus Torvalds Interview Part 2"

When some one searches for his essential thing, therefore he/she desires
to be available that in detail, therefore
that thing is maintained over here.

my page – marijuana addiction – Sebastian -

Great delivery. Sound arguments. Keep up thhe good effort.

Here is my blog :: Angle Bracket

This piece of writing is in fact a fastidious one it assists new internet users, who are wishing in favor of blogging.

Leave a Reply