x
Loading
 Loading
Featured Paper: Xen Virtualization with Novell SUSE Linux
Hello, Guest | Login | Register

Linux Magazine Text Markup Sample

HEAD: Power Tools

DECK: Magnificent Macro Magic: I Part One

AUTHOR: Jerry Peek

A I<macro processor> scans input text for defined symbols — the
I<macros> — and replaces that text by other text, or possibly by other
symbols. For instance, a macro processor can convert one language into
another.

If you’re a I<C> programmer, you know I<cpp,> the C preprocessor, a
simple macro processor. I<m4> is a powerful macro processor that’s
been part of Unix for some 30 years, but it’s almost unknown — except
for special purposes, such as generating the I<sendmail.cf> file. It’s
worth knowing because you can do things withI<m4>that are hard to do
any other way.

The GNU version of I<m4> has some extensions from the original I<V7>
version. (You’ll see some of them.) As of this writing, the latest GNU
version was I<1.4.2,> released in August 2004. I<Version 2.0> is under
development.

While you won’t become an I<m4> wizard in three pages (or in six, as
the discussion of I<m4> continues next month), but you can master the
basics. So, let’s dig in.

SUBHEAD: Simple Macro Processing

A simple way to do macro substitution is with tools like I<sed> and
I<cpp.> For instance, the command C<sed ’s/XPRESIDENTX/President
Bush/’> reads lines of text, changing every occurrence of
C<XPRESIDENTX> to C<President Bush>. I<sed> can also test and branch,
for some rudimentary decision-making.

As another example, here’s a C program with a I<cpp> macro named
C<ABSDIFF()> that accepts two arguments, C<a> and C<b>.

C<
#define ABSDIFF(a, b)
   ((a) &gt; (b) ? (a)-(b) : (b)-(a))
>

Given that definition, I will replace the code…

C<
diff = ABSDIFF(v1, v2);
>

… with

C<
diff = ((v1) &gt; (v2) ? (v1)-(v2) : (v2)-(v1));
>

C<v1> replaces C<a> everywhere, and C<v2> replace C<b>. C<ABSDIFF()>
saves typing — and the chance for error.

SUBHEAD: Introducing m4

Unlike I<sed> and other languages, I<m4> is designed specifically for
macro processing. I<m4>manipulates files, performs arithmetic, has
functions for handling strings, and can do much more.

I<m4> copies its input (from files or I<standard input>) to I<standard
output.> It checks each token (a name, a quoted string, or any single
character that’s not a part of either a name or a string) to see if
it’s the name of a macro. If so, the token is replaced by the macro’s
value, and then that text is pushed back onto the input to be
rescanned. (If you’re new toI<m4>, this repeated scanning may surprise
you, but it’s one key to I<m4>s power.)  Quoting text, like
C<`I<text>’>, prevents expansion. (See the section on “Quoting.”)

I<m4> comes with a number of predefined macros, or you can write your
own macros by calling the C<define()> function. A macro can have
multiple arguments — up to 9 in original I<m4>, and an unlimited
number in GNU I<m4>. Macro arguments are substituted before the
resulting text is rescanned.

Here’s a simple example (saved in a file named I<foo.m4>):

C<
one
define(`one', `ONE')dnl
one
define(`ONE', `two')dnl
one ONE oneONE
`one'
>

The file defines two macros named I and I.  It also has four
lines of text. If you feed the file to I using C, I produces:

C<
one
ONE
two two oneONE
one
>

Here’s what’s happening:

* Line 1 of the input, which is simply the characters C<one> and a
newline, doesn’t match any macro (so far), so it’s copied to the
output as-is.

* Line 2 defines a macro named C<one()>. (The opening parenthesis
before the arguments must come just after C<define> with no whitespace
between.) From this point on, any input string C<one> will be replaced
with C<ONE>. (The C<dnl> is explained below.)

* Line 3, which is again the characters C<one> and a newline, is
affected by the just-defined macro C<one()>. So, the text C<one> is
converted to the text C<ONE> and a newline.

* Line 4 defines a new macro named C<ONE()>. Macro names are
case-sensitive.

* Line 5 has three space-separated tokens. The first two are C<one>
and C<ONE>.  The first is converted to C<ONE> by the macro named
C<one()>, then both are converted to C<two> by the macro named
C<ONE()>. Rescanning doesn’t find any additional matches (there’s no
macro named C<two()>), so the first two words are output as C<two
two>. The rest of line 5 (a space, C<oneONE>, and a newline) doesn’t
match a macro so it’s output as-is.  In other words, a macro name is
only recognized when it’s surrounded by non-alphanumerics.

* Line 6 contains the text C<one> inside a pair of quotes, then a
newline.  (As you’ve seen, the opening quote is a backquote or grave
accent; the closing quote is a single quote or acute accent.) Quoted
text doesn’t match any macros, so it’s output as-is: C<one>. Next
comes the final newline.

Input text is copied to the output as-is and that includes newlines.
The built-in C<dnl> function, which stands for “delete to new line,”
reads and discards all characters up to and including the next
newline. (One of its uses is to put comments into an I<m4> file.)
Without I<dnl>, the newline after each of our calls to I<define> would
be output as-is. We could demonstrate that by editing I<foo.m4> to
remove the two C<dnl>s. But, to stretch things a bit, let’s use I<sed>
to remove those two calls from the file and pipe the result to I<m4>:

C<
$ sed 's/dnl//' foo.m4 | m4
one

ONE

two two oneONE
one
>

If you compare this example to the previous one, you’ll see that there
are two extra newlines at the places where C<dnl> used to be.

Let’s summarize. You’ve seen that input is read from the first
character to the last. Macros affect input text only after they’re
defined. Input tokens are compared to macro names and, if they match,
replaced by the macro’s value. Any input modified by a macro is pushed
back onto the input and is I<rescanned> for possible modification.
Other text (that isn’t modified by a macro) is passed to the output
as-is.

SUBHEAD: Quoting

Any text surrounded by C<` ‘> (a grave accent and an acute accent)
isn’t expanded immediately. Whenever I<m4>evaluates something, it
strips off one level of quotes. When you define a macro, you’ll often
want to quote the arguments — but not always. I<Listing One> has a
demo. It uses I<m4>interactively, typing text to its standard input.

< BEGIN Listing One -- Quoting demonstration >

C<
$ m4
define(A, 100)dnl
define(B, A)dnl
define(C, `A')dnl
dumpdef(`A', `B', `C')dnl
A:      100
B:      100
C:      A
dumpdef(A, B, C)dnl
stdin:5: m4: Undefined name 100
stdin:5: m4: Undefined name 100
stdin:5: m4: Undefined name 100
A B C
100 100 100
CTRL-D
$
>

< END Listing One >

The listing starts by defining three macros I<A>, I<B>, and I<C>. I<A>
has the value 100. So does I<B>: because its argument C<A> isn’t
quoted, I<m4>replaces C<A> with C<100> before assigning that value to
I<B>. While defining I<C>, though, quoting the argument means that its
value becomes literal C<A>.

You can see the values of macros by calling the built-in function
C<dumpdef> with the names of the macros. As expected, I<A> and I<B>
have the value C<100>, but I<C> has C<A>.

In the second call to C<dumpdef>, the names are not quoted, so each
name is expanded to C<100> I<before> C<dumpdef> sees them. That
explains the error messages, because there’s no macro named C<100>. In
the same way, if we simply enter the macro names, the three tokens are
scanned repeatedly, and they all end up as C<100>.

You can change the quoting characters at any time by calling
C<changequote>.  For instance, in text containing lots of quote marks,
you could call C<changequote({,})dnl> to change the quoting characters
to curly braces. To restore the defaults, simply call C<changequote>
with no arguments.

In general, for safety, it’s a good idea to quote all input text that
isn’t a macro call. This avoids I<m4>interpreting a literal word as a
call to a macro.  Another way to avoid this problem is by using the
GNU I<m4> option C<––prefix-builtins> or C<–P>. It changes all
built-in macro names to be prefixed by C<m4_>. (The option doesn’t
affect user-defined macros.) So, under this option, you’d write
C<m4_dnl> and C<m4_define> instead of C<dnl> and C<define>,
respectively.

Keep quoting and rescanning in mind as you use I<m4.> Not to be
tedious, but remember that I<m4> I<does> rescan its input. For some
in-depth tips, see “Web Paging: Tips and Hints on I<m4>Quoting” by
R.K. Owen, Ph.D., at
http://owen.sj.ca.us/rkowen/howto/webpaging/m4tipsquote.html.

SUBHEAD: Decisions and Math

I<m4> can do arithmetic with its built-in functions C<eval>, C<incr>,
and C<decr>. I<m4>doesn’t support loops directly, but you can combine
recursion and the decision macro C<ifelse> to write loops.

Let’s start with an example adapted from the file
I</usr/share/doc/m4/examples/debug.m4> (on a Debian system). It
defines the macro C<countdown()>. Evaluating the macro with an
argument of 5 — as in C<countdown(5)> — outputs the text C<5, 4, 3, 2,
1, 0, Liftoff!>.

C<
$ cat countdown.m4
define(`countdown', `$1, ifelse(eval($1 &gt; 0),
   1, `countdown(decr($1))', `Liftoff!')')dnl
countdown(5)
$ m4 countdown.m4
5, 4, 3, 2, 1, 0, Liftoff!
>

The C<countdown()> macro has a single argument. It’s broken across two
lines. That’s fine in I<m4>because macro arguments are delimited by
parentheses which don’t have to be on the same line. Here’s the
argument without its surrounding quotes:

C<
$1, ifelse(eval($1 &gt; 0), 1,
   `countdown(decr($1))', `Liftoff!')
)
>

C<$1> expands to the macro’s first argument. When I<m4>evaluates that
C<countdown> macro with an argument of C<5>, the result is:

C<
5, ifelse(eval(5 &gt; 0), 1,
   `countdown(decr(5))', `Liftoff!')
>

The leading “C<5, >“ is plain text that’s output as-is as the first
number in the countdown. The rest of the argument is a call to
C. C compares its first two arguments. If they’re
equal, the third argument is evaluated; otherwise, the (optional)
fourth argument is evaluated.

Here, the first argument to C<ifelse>, C<eval(5 > 0)>, evaluates as
1 (logical “true”) if the test is true (if 5 is greater than 0). So
the first two arguments are equal, and I<m4> evaluates
C<countdown(decr(5))>. This starts the recursion by calling
C<countdown(4)>.

Once we reach the base condition of C<countdown(0)>, the test C<eval(0
> 0)> fails and the C<ifelse> call evaluates C<`Liftoff!’>. (If
recursion is new to you, you can read about it in books on computer
science and programming techniques.)

Note that, with more than four arguments, I<ifelse> can work like a
I<case> or I<switch> in other languages. For instance, in
C<ifelse(a,b,c,d,e,f,g)>, if C<a> matches C<b>, then C<c>; else if
C<d> matches C<e> then C<f>; else C<g>.

The I<m4> I<info> file shows more looping and decision techniques,
including a macro named C<forloop()> that implements a nestable
for-loop.

This section showed some basic math operations. (The I<info> file
shows more.)  You’ve seen that you can quote a single macro argument
that contains a completely separate string (in this case, a string
that prints a number, then runs C<ifelse> to do some more work). This
one-line example (broken onto two lines here) is a good hint of
I<m4’>s power. It’s a mimimalist language, for sure, and you’d be
right to complain about its tricky evaluation in a global environment,
leaving lots of room for trouble if you aren’t careful. But you might
find this expressive little language to be challenging enough that
it’s addictive.

SUBHEAD: Building Web Pages

Let’s wrap up this I<m4> introduction with a typical use: feeding an
input file to a set of macros to generate an output file. Here, the
macro file I<html.m4> defines three macros: C<_startpage()>, C<_ul()>,
and C<_endpage()>. (The names start with underscore characters to help
prevent false matches with non-macro text. For instance, C<_ul()>
won’t match the HTML tag C<<ul>>.)  The C<_startpage()> macro
accepts one argument: the page title, which is also copied into a
level-1 heading that appears at the start of the page. The C<_ul()>
macro makes an HTML unordered list. Its arguments (an unlimited
number) become the list items. And C<_endpage()> makes the closing
HTML text, including a “last change” date taken from the Linux I<date>
utility.

I<Listing Two> shows the input file, and I<Listing Three> is the HTML
output.   The I<m4> macros that do all the work are shown in I<Listing
Four.> (Both the input file and the macros are available online at
http://www.linux-mag.com/downloads/2005-02/power.)

< BEGIN Listing Two — I<webpage.m4h,> an “unexpanded” web page >

C<
_startpage(`Sample List')
_ul(`First item', `Second item',
   `Third item, longer than the first two')
_endpage
>

< END Listing Two >

< BEGIN Listing Three -- An Igenerated web page >

C<
$ m4 html.m4 webpage.m4h &amp;gt; list.html
$ cat list.html
&lt;html&gt;
&lt;head&gt;
&lt;title&gt;Sample List&lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;h1&gt;Sample List&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;First item&lt;/li&gt;
&lt;li&gt;Second item&lt;/li&gt;
&lt;li&gt;Third item, longer than the first two&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Last change: Fri Jan 14 15:32:06 MST 2005
&lt;/p&gt;
&lt;/body&gt;
&lt;/html&gt;
>

< END Listing Three >

In I<Listing Four,> both C<_startpage()> and C<_endpage()> are
straightforward. The C<esyscmd> macro is one of the many I<m4>macros
we haven’t covered — it runs a Linux command line, then uses the
command’s output as input to I<m4>. The C<_ul()> macro outputs opening
and closing HTML C<<ul>> tags, passing its arguments to the
C<_listitems()> macro via C<$@>, which expands into the quoted list of
arguments.

C<_listitems()> is similar to the C<countdown()> macro shown earlier:
C<_listitems()> makes a recursive loop. At the base condition (the end
of recursion), when C<$#> (the number of arguments) is 0, the empty
third argument means that C<ifelse> does nothing. Or, if there’s one
argument (C<$#> is 1), C<ifelse> simply outputs the last list item
inside a pair of C<<li>> tags. Otherwise, there’s more than one
argument, so the macro starts by outputting the first argument inside
C<<li>> tags, then calls C<_listitems()> recursively to output
the other list items. The argument to the recursive call is
C<shift($@)>. The I<m4> C<shift> macro returns its list of arguments
without its first argument — which, here, is all of the arguments we
haven’t processed yet.

Notice the nested quoting: some of the arguments inside the (quoted)
definition of C<_listitems()> are quoted themselves. This delays
interpretation until the macro is called. (m4 tracing, which we’ll
cover next month, can help you see what’s happening.)

< BEGIN Listing Four -- I macros to generate an HTML page from I >

C<
define(`_startpage', `&lt;html&gt;
&lt;head&gt;
&lt;title&gt;$1&lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;h1&gt;$1&lt;/h1&gt;')dnl
dnl
define(`_endpage', `
&lt;p&gt;Last change: esyscmd(date)&lt;/p&gt;
&lt;/body&gt;
&lt;/html&gt;')dnl
dnl
define(`_listitems', `ifelse($#, 0, ,
   $#, 1, `&lt;li&gt;$1&lt;/li&gt;',
   `&lt;li&gt;$1&lt;/li&gt;
_listitems(shift($@))')')dnl
define(`_ul', `&lt;ul&gt;
_listitems($@)
&lt;/ul&gt;')dnl
&gt;

< END Listing Four >

SUBHEAD: To be continued…

This month, you’ve seen some basics of I<m4:> scanning input text,
replacing any tokens that match macro names with the macro values.

Next month, we’ll dig deeper into I<m4:> diversions, included files,
frozen files, debugging and tracing, and other built-in macros.

If you’d like to do more in the meantime, the GNU I<m4>info file (type
C<info m4>) has a lot of information and examples.

BIO: Jerry Peek is a freelance writer and instructor who has used Unix
and Linux for over 20 years. He’s happy to hear from readers; see
http://www.jpeek.com/contact.html. Sample files from this column can
be found online at http://www.linux-mag.com/downloads/2005-02/power.
Community Tools
RSS
Recommend This [?]
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Users That Liked This [?]
No one yet. Be the first.
Tags:
Tag This!
 No Comments

Now entering its eighth year of continuous publication, Linux Magazine presents the broadest and most in-depth coverage of all things Linux — from the Linux operating system, to desktop and server hardware, to open source software in the enterprise.

Written by experts and open source community leaders, each month’s Linux Magazine offers Linux professionals — programmers, system administrators, webmasters, IT managers, and business leaders — pragmatic, insightful, and solutions-oriented information.

Linux Magazine provides an information resource for a diverse open source community.

For more information about the magazine and its readership, please download the the PDF version of the 2007 Linux Magazine Media Kit. The Media Kit includes the 2007 Editorial Calendar.

Please contact Robert Wells, Vice President of Business Development for online and print advertising programs, a current rate card, outreach campaigns, and other marketing opportunities.

Read More
  1. Network Block Devices: Using Hardware Over a Network
  2. The Importance of Command Line Literacy
  3. Easy Backups with AMANDA
  4. Wizard Boot Camp, Part 10: Utilities You Should Know
  5. What's GNU, Part Four: find

Comments on Linux Magazine Text Markup Sample

No comments yet.

Sorry, the comment form is closed at this time.

ActivSupport
Linux Magazine has chosen ActivSupport as IT consultants.
Sponsored Links