Linus Torvalds once said, “If you deny the Index, you really deny git itself.” (February 4, 2006, Git List Archives). Rather than try to sweep the mysteries and complexities of the git Index under the rug, some explanation and examples can help clarify it, expose its power, and allow you to revel in it!
There are several articles that provide background information about git. The impetus for the software’s creation and some of the early history of its development are covered in Sam Williams’s article “Git With It! ” in the August 2005 issue of Linux Magazine (available online at http://www.linux-mag.com/2005-08/git/). The first part of this series, “How to Git It” (available online at http://www.linux- mag.com/2006-03/git/), explains how to obtain and install git, and introduces the basics of git’s Object Database and Index.
In this second part of the series, let’s look more at the Index. The Index is fundamental to git, and provides the basis for almost every operation, simple or complex, that git provides. Let’s create a project, follow the code through several revisions implemented by two independent developers, and finally unify the results. Along the way, the subtle yet powerful operations within the git Index will be explained.
Creating a Project With git
The simplest way to create a new project using git is to use the git init-db command. The command can be used to either create a brand new repository for new development, or can be used on a snapshot of an existing source tree to turn it into a git repository.
Here is a quick-and-dirty prime number program written in C. Never mind that it has issues — those will be fixed soon enough. Before making any modifications, place it under source control with git init-db.
$ pwd
/usr/src/primes
$ cat primes.c
int main(int argc, char **argv)
{
int n;
int d;
int prime;
for (n = 1; n <= 25; n++) {
prime = 1; /* assume prime */
for (d = 2; d <= sqrt(n); d++) {
if ((float) n / d == n / d) {
prime = 0;
break;
}
}
if (prime) {
printf(”%d is prime\n”, n);
}
}
}
$ git init-db
defaulting to local storage area
$ ls –aF
./ ../ .git/ primes.c
While the primes directory has been turned into a git repository complete with a .git directory, the git Index still has no knowledge of what files need to be managed. You can add files one at a time using git add file or you can add an entire directory structure using the git add.:
$ git add .
$ git status
#
# Initial commit
#
# Updated but not checked in:
# (will commit)
# new file: primes.c
#
git add tells the Index to add the new file primes.c to the repository. So far, though, this has only modified the Index and added the one blob object that represents primes.c. The state has not yet been committed to the object database.
Recall that the Index forms a staging ground where modifications to the repository collect until the time when you want to commit and enter them into the object database. git status provides a very valuable guide to help determine what a git commit operation will do. In this case, there is one new file, primes.c, to check in and commit.
$ git commit –m “Initial primes project.”
Committing initial tree 2b076ab786485f2af451b3f5139099155ae93f0c
$ git status
nothing to commit
After the commit, git status reports that there are no longer any changes to commit, nor are there any changes that are seen but won’t be committed. However, the git log command can now be used to see historical commit information.
$ git log
commit 9ec29c6852926e5fb583d68275f6d28574b355d7
Author: Jon Loeliger <jdl@freescale.com>
Date: Tue Mar 14 19:36:21 2006 -0600
Initial primes project.
With a repository started and at least one file in it, the interesting operations on the Index can be shown.
Index as Staging Ground
The Index can be used as an incremental staging area for the development of the next revision of the repository to be committed. Any changes that need to be made must be introduced through the Index. For example, each new file must be explicitly added, as above, or git won’t know to maintain it under revision control. However, you may wish to make a distinction between three classifications of files: those that should be under revision control, those that are never to be managed by revision control or even really thought about, and those that are not yet classified.
Introducing a new file, Makefile, to build the primes executable from primes.c exposes these classification issues.
$ cat Makefile
primes: primes.o
$(CC) –o $@ –lm $<
$ make
cc –c –o primes.o primes.c
primes.c: In function ’main’:
primes.c:12: warning: incompatible implicit \
declaration of built-in function ’sqrt’
primes.c:19: warning: incompatible implicit \
declaration of built-in function ’printf’
cc –o primes –lm primes.o
Oh, no! Disastrous warnings! But it looks like it built nonetheless. Before the warnings are cleaned up, though, what does git “think” about all these new files? The command git ls-files asks for a list of files known to git and git ls-files ––others asks for the set of files that are not known. (git ls-files is the basis for the git status command.)
$ git ls-files
primes.c
$ git ls-files ––others
Makefile
primes
primes.o
git maintains a manifesto of all the files that need to be managed under revision control. Use git add on Makefile to add it:
$ git add Makefile
$ git status
#
# Updated but not checked in:
# (will commit)
# new file: Makefile
#
# Untracked files:
# (use “git add” to add to commit)
# primes
# primes.o
However, the other two files, primes and prime.o are generated files that likely shouldn’t ever be under revision control. If no other action is taken with respect to these two files, git status will continually issue a reminder that the files are “untracked files” each and every time they are present in the directory (that is, after a build). Instead, direct git to ignore these files, effectively classifying them as “never under revision control.”
To do that, add both files to a special file called .gitignore in the appropriate directory, and add that file to the Index:
$ cat .gitignore
primes
primes.o
$ git add .gitignore
$ git status
# Updated but not checked in:
# (will commit)
# new file: .gitignore
# new file: Makefile
Throughout the addition of the Makefile process, nothing has been committed to the object repository yet. So far, all of the git commands have operated on the Index, building up a new state of the repository that can be captured with the next git commit. Since the status above shows just the addition of the two new files, and nothing else is left unresolved, the next commit has been fully staged in the Index.
$ git commit –m “Add Makefile and .gitignore”
The Index can also be used to indicate that existing files are ready to be committed after some local modifications. For example, the compilation warnings can be eliminated by adding two #include lines to the top of the file primes.c:
#include <stdio.h>
#include <math.h>
To see the effect of this change, simply invoke git diff:
$ git diff
diff ––git a/primes.c b/primes.c
index a8b37bb..1fa6c35 100644
— a/primes.c
+++ b/primes.c
@@ -1,3 +1,6 @@
+#include <stdio.h>
+#include <math.h>
+
int
main(int argc, char **argv)
{
After making this modification, the local directory contents have been modified, but the Index has not. Running git status indicates that there is a modified file that is not yet updated, and issues a reminder that git update-index may be used to perform that step.
$ git status
# Changed but not updated:
# (use git-update-index to mark for commit)
# modified: primes.c
nothing to commit
$ git update-index primes.c
$ git status
#
# Updated but not checked in:
# (will commit)
# modified: primes.c
While this may seem like a trivial example, it’s the crux of the mechanism git uses to determine that a file needs to be committed. In this state, the primes.c file has had a new object with a new SHA1 introduced into the object store as a result of the update-index request, and the Index now knows that it’s been updated. The jargon used to describe this state is usually something like “a clean but uncommitted working directory.” Before the update-index, when the directory was modified but not reflected in the Index, the state was considered dirty.
To highlight an important feature of the Index, make one more modification before committing the revision: change the upper bound from 25 to 50 in primes.c and run git diff again:
$ git diff
diff ––git a/primes.c b/primes.c
index 1fa6c35..04290d9 100644
— a/primes.c
+++ b/primes.c
@@ -8,7 +8,7 @@ main(int argc, char **argv)
int d;
int prime;
- for (n = 1; n <= 25; n++) {
+ for (n = 1; n <= 50; n++) {
prime = 1; /* assume prime */
That may have come as somewhat of an unexpected surprise! What happened to the two #include lines? Use git status to help figure out what happened:
$ git status
#
# Updated but not checked in:
# (will commit)
# modified: primes.c
#
# Changed but not updated:
# (use git-update-index to mark for commit)
# modified: primes.c
primes.c is listed twice, both as a change that will be committed and as a change that will not be committed! In fact, the change that has been staged in the Index and that will be committed is the version of primes.c that existed when the update-index operation was performed on it, namely the addition of the #include lines. However the Index also knows that the file was further modified after it was staged.
The file, as it is in the working directory, really has both changes in it. But as the developer, you must now ask yourself the question, “Which version should be committed?” If you want the version staged in the Index, just perform git commit. After all, that’s the purpose of staging the changes in the Index! The hard constant change will remain in the working directory. On the other hand, if you want both changes to be committed, you must run git update-index to place the correct version in the Index before committing it.
Before answering that question, though, look at the differences that git diff produced. The git diff command is really a swiss-army knife for determining the set of changes between various versions of files in the working directory, the Index, and the Object Database.
Without parameters, as used above, or with only file names, git diff generates the differences between the current working directory and the Index. Thus, the first invocation of git diff above compared the working directory to the Index and discovered that the working directory had added the #include lines compared to the version of the file already in the Index.
Executing the git update-index primes.c command, though, brought the working directory version into the Index, thus synchronizing the Index and the working directory versions. Another git diff would show no differences at this point!
To reveal the complete set of changes between the most recently committed version of the file, also known as the HEAD version, and the working directory version, the git diff HEAD command can be used.
$ git diff HEAD
diff ––git a/primes.c b/primes.c
index a8b37bb..04290d9 100644
— a/primes.c
+++ b/primes.c
@@ -1,3 +1,6 @@
+#include <stdio.h>
+#include <math.h>
+
int
main(int argc, char **argv)
{
@@ -5,7 +8,7 @@ main(int argc, char **argv)
int d;
int prime;
- for (n = 1; n <= 25; n++) {
+ for (n = 1; n <= 50; n++) {
prime = 1; /* assume prime */
To reveal just the changes between the HEAD version and the version staged, or cached, in the Index, add the –-cached flag:
$ git diff ––cached HEAD
diff ––git a/primes.c b/primes.c
index a8b37bb..1fa6c35 100644
— a/primes.c
+++ b/primes.c
@@ -1,3 +1,6 @@
+#include <stdio.h>
+#include <math.h>
+
int
main(int argc, char **argv)
{
As that represents the differences between the HEAD and Index, those are the changes that will be committed with the first git commit. The update-index and second commit picks up the changes for the upper limit constant changes, commits them, and cleans up the working directory:
$ git commit –m “Add include headers”
$ git update-index primes.c
$ git commit –m “Change limit to 50″
Branches
Development of software within a repository is often performed in discrete steps, and often on different development branches within the same repository.
A branch is a conceptual line of development that usually has split off of the main development line at some identifiable point in the past. The power of branches comes from the fact that multiple independent, but hopefully coordinating, development efforts can then happen on different branches. Latter, if needed, a merge operation can bring multiple branches back together again.
There are two ways to create new branches: git branch and git checkout –b. Both forms allow you to introduce a new, named line of development and to specify the basis location for the branch, defaulting to the current HEAD revision of the current branch.
$ git branch
* master
$ git branch brad
$ git branch
brad
* master
Here, git branch without arguments is used to determine the current set of branches in the repository, namely master. This is the active branch as denoted by the * next to it, and was created by the initial git init-db as the default branch to hold the main- line or master development. Up until now, all the development that’s been done has occurred on this master branch!
A new branch for developer Brad has been introduced using git branch brad. As no additional argument was used, the brad branch splits from the main line development at the current HEAD revision of the master branch. Finally, using git branch again shows that there are now two branches, brad and master, with master still the checked-out current branch.
Internally, git tracks these branch names in the .git/refs/ heads directory. They are very inexpensive as they are nothing more than the recording of a SHA1 value in a file.
$ ls .git/refs/heads/
brad master
$ cat .git/refs/heads/brad
1964adf822b7317df39b95aabd244521b04a51ee
$ cat .git/refs/heads/master
1964adf822b7317df39b95aabd244521b04a51ee
Since a branch name in git refers to the most recent commit on that branch by its SHA1, it’s clear that the brad and master branch are currently at exactly the same state.
A more useful way to see the branch state is to use the git show- branch command:
$ git show-branch
! [brad] Change limit to 50
* [master] Change limit to 50
–
+* [brad] Change limit to 50
Above the –- separator, each line represents a named branch within the repository with the current branch denoted with a *. A one-line commit line from the commit log message is used to help identify the HEAD commit represented by each branch. Associated with each branch in a vertical column below the –- separator are characters that indicate if a particular commit is or is not present in a branch. As indicated above by the last line with +*, both the brad branch and the master branch contain the Change limit to 50 commit. The column would be empty if the commit named on the line was absent.
If Brad is tasked with adding some documentation, he can do that independently in his branch now after checking it out using:
$ git checkout brad
Assume Brad makes the following edit, commits it, makes a few more edits, commits them as well, and then finally re-examines the branch structure. Using –a with git commit causes it to automatically do an update-index before the actual commit.
$ git diff
diff ––git a/primes.c b/primes.c
index 04290d9..910aa4e 100644
— a/primes.c
+++ b/primes.c
@@ -1,3 +1,7 @@
+/*
+ * The Prime Number Project
+ */
+
#include <stdio.h>
#include <math.h>
$ git commit –a –m “Add header comment.”
# … a few more edits and commits…
$ git show-branch
* [brad] Document divisor search.
! [master] Change limit to 50
–
* [brad] Document divisor search.
* [brad^] Document main loop.
* [brad~2] Add header comment.
*+ [master] Change limit to 50
There are three more commits on the brad branch than there are on the master branch, namely the brad, brad^ and brad~2 commits; these commits have * in the brad column while the corresponding master column entries are blank.
The funny names brad^ and brad~2 are short-hand names that refer to commits in the history of the brad branch. brad^ is the parent, or previous, commit before the current HEAD of the brad branch, and brad~2 is the parent of brad^ or grandparent of brad. (See the sidebar to learn more about commit short-hand names.)
Read More
- A Veritable Scatter Shot!
- Why Parrot is Important
- The Adobe AIR File API
- This Week on Github: iPhone Development
- This Week on Github: In Good Company
ActivSupport
Sponsored Links
|