What’s the diff?

Get more control over how file differences are found and displayed with some lesser-known options, and other techniques for getting the output you need.

There are good graphical file-comparison programs available. They aren’t always the best tool for comparing files, though. One reason to use the command-line diff is that it can give you much more control over how differences are found and displayed.

You’re probably familiar with diff(1). We’ll look at some details of how diff works, some lesser-known options, and other techniques for getting the output you need.

This is the first of a series about diffutils, an important package of tools.

Different diffs

You might be surprised to know that, given two files, diff may not always give the same set of differences. As an example, Table One shows two snippets of C code in files named 1.c and 2.c.

Table One: Two short files

File 1.c File 2.c
a *= b; c *= d;
b *= c; b *= c;
c *= d; a *= b;

GNU diff version 2.8.1 shows that the first two lines of 1.c were deleted, and that two new lines were added to 2.c:

$ diff 1.c 2.c
< a *= b;
< b *= c;
> b *= c;
> a *= b;

But you might prefer to think that the first and third lines were replaced, while the second line is the same. This would be a valid diff output, too:

$ hypothetical_diff 1.c 2.c
< a *= b;
> c *= d;
< c *= d;
> a *= b;

Obviously, there's more than one way to represent the differences between two files. The standard diff algorithm will usually do a good job. There are times, though, that you might want another opinion:

  • The option -d or --minimal tells diff to "try harder" to find the smallest set of changes; that makes diff slower.
  • If two large files have a few widely-spaced changes, the GNU option --speed-large-files can make diff run faster.

Hunks, whitespace

diff works by finding groups of lines that are the same in both files. Between those groups of similar lines are hunks: groups of lines that differ.

By default, when diff is finding common lines and hunks, it looks at every character on a line. That includes whitespace characters: spaces, tabs, newlines, and carriage returns (which can come from Microsoft systems that use a CR before every NL). If diff shows a difference between two lines that look identical to you, the difference might be in the whitespace. Here are two ways to deal with that:

  1. Pipe diff's output to cat -te (which you may need to type as cat -t -e on older versions of cat). This shows TABs as ^I and marks the ends of lines with $, making it easy to see what's different in the whitespace.
  2. Use a diff option that tells it to ignore some or all differences in whitespace. GNU diff has several of these:
    • The -E option treats a TAB the same as the equivalent number of spaces.
    • Use -b to treat every sequence of whitespace the same, and also to ignore whitespace at the ends of lines.
    • The -w option ignores every whitespace character completely. (So the words out side would compare equal to outside.)
    • To ignore completely-empty lines (that is, a line that's nothing but a newline character), use -B.

    For more information, use man diff or, better, info diff.

Ignoring case, ignoring certain lines

The -i option makes upper-case and lower-case letters compare equal. So a line in one file, and A lInE in the other file, wouldn't be part of a hunk.

If there are particular lines that diff should ignore completely, pass them to diff with the -I option. Listing One has an example. We use cat -n to show line numbers on the two files. Without -I, diff shows changes to the headings. With -I HEAD, diff ignores lines that contain the string HEAD in both afile and bfile -- which, in this case, are all of the headings.

Listing One: Ignoring certain lines

$ cat -n afile
     1  line a
     2  line b
     3  A HEADING
     4  line c
     5  line d
$ cat -n bfile
     1  A HEADING
     2  line a
     3  line b
     5  line c
     6  line d
$ diff [ab]file
$ diff -I HEAD [ab]file

Note that the argument to -I must match the corresponding line in both files. If a line exists in both files, but the pattern matches only one of them, diff shows that line as a difference. Here, for instance, the pattern ANOTHER only matches in bfile -- not the corresponding line in afile -- so diff outputs that line:

$ diff -I ANOTHER [ab]file

You can also use grep-style regular expressions. For example, here's how to ignore lines starting with an upper-case letter: the regular expression ^[A-Z] or, more portably, the character class ^[[:upper:]]:

$ diff -I '^[[:upper:]]' [ab]file

You can specify multiple patterns to ignore by using multiple -I options.

Ignoring line breaks

diff is line-oriented: it compares entire lines (strings ending with newline characters). If you edit a file with an editor that re-formats lines -- moving the newlines to different places -- diff will show differences where there are none (other than the newlines).

One way to solve this is by getting the diff front-end named wdiff, which was first written around 1990. It breaks lines into separate words, then compares the words with diff. wdiff also has some features that are handy for interactive browsing of differences.

If you don't want the complexity of wdiff, you can do something similar yourself by replacing all sequences of tabs and spaces with single newline characters. This puts each word on a separate line -- so diff can find groups of common words without being confused by the differences in newline characters. The command tr -s '\t ' '[\n*]' does the job. (Note the space after \t in the first argument.)

Listing Two shows an example. file1 has four words, and cat -te file1 shows that there are both spaces and TABs between the words. (Word processors may add multiple spaces between words. We've tossed in some TABs for good measure.) Note that tr can't accept a filename as an argument; you have to use the shell's < character to take tr's standard input from that file. The output of tr has one word per line.

Listing Two: Showing word differences

zsh% cat file1
this    is      a
zsh% cat -te file1
this    is^Ia ^I $
zsh% tr -s '\t ' '[\n*]' <file1
zsh% cat file2
this is
a test too
zsh% diff =(tr -s '\t ' '[\n*]' <file1) =(tr -s '\t ' '[\n*]' <file2)
> too

We're using Z shell process substitution to run tr on both file1 and file2, then pass the results to diff as two temporary arguments. (The article Catching Some ZZZs, available online at http://www.linux-mag.com/id/1579, introduces process substitution in the Z Shell. Bash process substitution is similar, but it doesn't have the zsh operator =() -- which we might need in case diff tries to rewind its file inputs.)

You could also run tr twice, saving its output in two temporary files, then compare those files:

$ tr -s '\t ' '[\n*]' <file1 > temp1
$ tr -s '\t ' '[\n*]' <file2 > temp2
$ diff temp[12]
> too
$ rm temp[12]

Custom comparisons

You're a judge in the International Obfuscated C Code Contest (http://www.ioccc.org/). You'd like to see the differences between two versions of a program that's written as "ASCII art," in the shape of a smiley face.

"Bits and pieces: Comparing binary data (and more)" showed the bdiff script. It compares binary files character-by-character using od to show visible representations, one character per line -- and diff to compare them. For text files, we don't need od.

Since whitespace generally doesn't matter in C programming, you use a little sed script to put each character on a separate line, then remove all spaces and TABs. Listing Three shows the script named sedscr. The first command spans two lines. The square brackets contain a space and a TAB. You run both files through sed, then use the unified-format diff option -u to show three characters of context around each hunk:

Listing Three: Un-obfuscated diff

$ cat sedscr
s/[     ]//g

$ sed -f sedscr old.c > old.tmp
$ sed -f sedscr new.c > new.tmp
$ diff -u old.tmp new.tmp
--- old.tmp  ...
+++ new.tmp  ...
@@ -731,6 +731,7 @@

Somewhere around the 734th non-whitespace character, the new version of the file has an added 0.

You might have done better with a sed script that breaks text into separate lines at semicolons (;) -- the C statement separator -- and also removes whitespace. Once you see the technique, though, you can tweak the sed script, or use a different method, to eliminate text that obfuscates diff -- and find the real differences.

Which function or section?

If a program file contains multiple functions, it can be useful to know which function has changed. The GNU diff option -p "understands" C language syntax enough to keep track of each function it's reading. When it finds a change, it outputs the first line of the function at the start of the difference listing. For example (using -C 0 to suppress context output, to save printing space):

$ diff -C 0 -p prog[12].c
*** prog1.c     ...
--- prog2.c     ...
*************** int calc (int a) {
*** 48 ****
!       c *= c;
--- 48 ----
!       b *= c;
*************** void do_output() {
*** 120 ****
!       fprintf (stderr, "OUTPTU:");
--- 120 ----
!       fprintf (stderr, "OUTPUT:");

If you aren't comparing C code, the option -F regexp uses a line matching regexp as the heading. For instance, in HTML code that has header lines tagged with <h1>, <h2>, and so on:

$ diff -C 0 -F '^<[hH][1-5]>' a.html b.html
*** a.html      ...
--- b.html      ...
*************** <h1>Summary</h1>
*** 19 ****
! <li>Yo-yos and pop gums,</li>
--- 19 ----
! <li>Yo-yos and pop guns,</li>
*************** <h3>Pop Guns</h3>
*** 47 ****
! Large grene $1.49
--- 47 ----
! Large green $1.49

Comments on "What’s the diff?"


I find that kdiff3 does a supurb job of genearating a side by side difference with different highlighting for the actually changed characters in a line. It is available for both Windows and Linux/Unix/cygwin and related platforms.

If it guesses wrong on lines that should be associated, the two (or three) files can be synchronized by marking a “common” point in the file. This can be done multiple times if necessary. I found this a “lifesaver” when comparing code written in two different but similar languages.

from the website:
KDiff3 is a program that

compares or merges two or three text input files or directories,
shows the differences line by line and character by character (!),
provides an automatic merge-facility and
an integrated editor for comfortable solving of merge-conflicts,
supports Unicode, UTF-8 and other codecs, autodetection via byte-order-mark “BOM”
supports KIO on KDE (allows accessing ftp, sftp, fish, smb etc.),
Printing of differences,
Manual alignment of lines,
Automatic merging of version control history ($Log$),
and has an intuitive graphical user interface.
Windows-Explorer integration Diff-Ext-for-KDiff3 – shell extension included in installer (originally by Sergey Zorin: see also Diff Ext)
KDE-Konqueror service menu plugin
Simplified integration with IBM-Rational-Clearcase for Windows (Details).
Read what else is special in a short abstract (PDF).

Supported platforms:
GNU/Linux with KDE3,
Any Un*x that is supported by the Qt-libs from Trolltech,
Apple Mac OSX binary available.
In theory any platform for which Qt-libs work.

There may be noticeably a bundle to find out about this. I assume you made sure good points in features also.

You have remarked very interesting points! ps decent internet site.

Rattling clear website , appreciate it for this post.

Very interesting points you have remarked, thanks for putting up. “A big man is one who makes us feel bigger when we are with him.” by John C. Maxwell.

You made some clear points there. I looked on the internet for the subject matter and found most individuals will agree with your blog.

227026 105309Take a peek at the following suggestions what follows discover ideal method to follow such a mainly because you structure your small business this afternoon. earn cash 881490

I savor, result in I discovered just what I used to be looking for. You’ve ended my four day lengthy hunt! God Bless you man. Have a great day. Bye

Sweet blog! I found it while surfing around on Yahoo News. Do you have any suggestions on how to get listed in Yahoo News? I ave been trying for a while but I never seem to get there! Cheers

Attractive section of content. I just stumbled upon your weblog and in accession capital to assert that I acquire actually enjoyed account your blog posts. Anyway I’ll be subscribing to your augment and even I achievement you access consistently fast.

Some genuinely excellent info , Gladiolus I detected this.

PIqhIf Just Browsing While I was browsing today I saw a excellent post concerning

Very few sites that transpire to become in depth below, from our point of view are undoubtedly effectively really worth checking out.

Nice post. I learn something totally new and challenging on sites I stumbleupon every day. It’s always helpful to read content from other authors and use something from other websites.

Always a major fan of linking to bloggers that I adore but really don’t get a lot of link enjoy from.

Below you will obtain the link to some internet sites that we believe you ought to visit.

I am often to blogging and i really appreciate your content. The article has really peaks my interest. I am going to bookmark your site and keep checking for new information.

Hi there! This post couldn’t be written any better! Reading this post reminds me of my old room mate! He always kept talking about this. I will forward this article to him. Pretty sure he will have a good read. Thanks for sharing!

Thanks for any other informative site. The place else may I am getting that type of information written in such a perfect way? I’ve a undertaking that I’m simply now running on, and I’ve been on the glance out for such information.

Every once in a while we opt for blogs that we read. Listed beneath would be the most up-to-date sites that we pick out.

Here are some links to web-sites that we link to simply because we assume they may be really worth visiting.

Sites of interest we’ve a link to.

Some truly fantastic information, Gladiola I found this. “Literature … is the rediscovery of childhood.” by Georges Bataille.

Although internet sites we backlink to beneath are considerably not connected to ours, we feel they may be actually worth a go by, so possess a look.

We came across a cool web site which you may well delight in. Take a appear in case you want.

Sites of interest we have a link to.

Just beneath, are quite a few completely not related web-sites to ours, nevertheless, they are surely really worth going over.

Always a big fan of linking to bloggers that I adore but really don’t get a good deal of link like from.

I and my guys ended up reading the great guidelines located on your web site and all of the sudden got an awful feeling I had not expressed respect to the web blog owner for those secrets. Most of the young men came consequently excited to study them and have absolutely been loving these things. Appreciation for being really helpful and also for finding this form of brilliant tips millions of individuals are really needing to know about. My honest regret for not expressing gratitude to sooner.

4HrvYr qlcbmktawqnd, [url=http://hbberilztujq.com/]hbberilztujq[/url], [link=http://nkrtbzbwddau.com/]nkrtbzbwddau[/link], http://fqvwpwvigofa.com/

Dead pent content material, regards for information .

Here are several of the web-sites we recommend for our visitors.

Always a large fan of linking to bloggers that I appreciate but really don’t get quite a bit of link adore from.

One of our visitors not long ago encouraged the following website.

Hello There. I found your blog using msn. This is a really well written article. I’ll be sure to bookmark it and come back to read more of your useful info. Thanks for the post. I will definitely return.

4vO1Gj exsnzkmjllfw, [url=http://mdxmuhqhefiw.com/]mdxmuhqhefiw[/url], [link=http://dtsovryutgwf.com/]dtsovryutgwf[/link], http://vddapsvpgobr.com/

23tFCB upvooxhbvhle, [url=http://npewppwuxykl.com/]npewppwuxykl[/url], [link=http://zwwjeuqurqcl.com/]zwwjeuqurqcl[/link], http://tqmfyeshkbqh.com/

Just beneath, are various absolutely not connected web-sites to ours, nevertheless, they may be surely worth going over.

I truly appreciate this blog.Really looking forward to read more. Awesome.

IZu3oH oofaomveheam, [url=http://pauzrfapdkgr.com/]pauzrfapdkgr[/url], [link=http://xwylttwezkol.com/]xwylttwezkol[/link], http://xtfhkniuduvm.com/

The time to study or pay a visit to the material or web-sites we have linked to below.

4Y7qfj kqyapepyevzc, [url=http://keeylzcrlrtq.com/]keeylzcrlrtq[/url], [link=http://pbthtwcgdfjx.com/]pbthtwcgdfjx[/link], http://vdamzgvqltfb.com/

Just beneath, are a lot of entirely not related web sites to ours, however, they are surely really worth going over.

Very informative post. Will read on…

Superb blog you have here but I was wondering if you knew of any message boards that cover the same topics talked about in this article? I’d really like to be a part of online community where I can get responses from other knowledgeable people that share the same interest. If you have any suggestions, please let me know. Many thanks!

Ahaa, its fastidious conversation concerning this piece of
writing here at this blog, I have read all that, so now me also commenting

Always a huge fan of linking to bloggers that I like but really don’t get quite a bit of link enjoy from.

RVtCiz ujyiyfavhimz, [url=http://izrapjyhogtp.com/]izrapjyhogtp[/url], [link=http://estqfyhiykju.com/]estqfyhiykju[/link], http://njeahtcmphos.com/

tqZ7kN tfdpdvdrkewb, [url=http://dbfdmynczkax.com/]dbfdmynczkax[/url], [link=http://nyttjvncgrys.com/]nyttjvncgrys[/link], http://oxmjjxofwhry.com/

Leave a Reply