Working with Language Extensions

Many of you are familiar with the C and C++ languages. You know the syntax and the semantics of the various operations and have a feel for what is allowed by the language according to its specification. However, you may (or may not) be surprised to discover that compilers for these languages deviate from the official specifications.

Many of you are familiar with the C and C++ languages. You know the syntax and the semantics of the various operations and have a feel for what is allowed by the language according to its specification. However, you may (or may not) be surprised to discover that compilers for these languages deviate from the official specifications.

For the most part, compilers perform all of the necessary functions to get your programs running, but they often add features to the language that are not in the language specification. These additional features can be included for a number of different reasons — to aid programmers in writing code or to fill in supposed deficiencies in the language, for example.

This month, we’re going to explore some of the language extensions found in gcc and g++. We’ll also spend a little time looking at how and why they’re used.

One notable project that makes use of these extensions is the Linux kernel. If you’re doing any work at the kernel level, it’s definitely worth your time to familiarize yourself with these GCCisms.

That said, before diving in and making wide use of these language extensions, it’s a good idea to do a bit of advance planning. The moment you start using one of these features, you must be sure that the application you are developing will only be compiled with gcc or g++. The Linux kernel can make this assumption, but chances are that other applications cannot. If there is any chance that you will need to compile the program with a different compiler, it’s a good idea to stay away from compiler-specific extensions. Now that you’ve been warned, let’s take a look at some of these language extensions.

Statement Expressions

The most widely used extension to gcc is probably the statement expression. This construct allows programmers to aggregate multiple statements into a single expression. Take a look at Figure One for an example of a statement expression in use.

Figure One: Statement Expression Example

int abs_foo_value = ({
int y = foo ();
int z;
if (y > 0)
z = y;
z = -y;

The syntax for statement expressions is to simply enclose a block of code in parentheses. The statement expression evaluates to the value of the last statement in the block. Thus, in Figure One, the statement expression evaluates the value of z at the end of the block. If the last statement in the block does not evaluate to any value, the expression is said to be a void expression (much like a void function that does not return a value).

As explained on the gcc info page, statement expressions are especially useful for making “safe” macros. When using macros, one must often worry about passing arguments that have side-effects to them. For example, notice that the MIN macro defined in Figure Two takes in two numbers (whether they be ints or floats) and evaluates to the lesser of the two. Now imagine what would happen if you did this:

Figure Two: MIN macro Definition

#define min(x,y) ((x) < (y) ? (x) : (y))

min_value = min(i++, j++);

This expands to:

min_value = ((i++) < (j++) ? (i++) : (j++));

which means that the value of either i or j is increasing too many times! This is where using a statement expression could really come in handy. Take a look at Figure Three for an example of a macro that uses statement expressions to solve this problem.

Figure Three: MIN Macro Definition Using a Statement Expression

#define min(x,y) ({ \
int _x = (x); \
int _y = (y); \
_x < _y ? _x : _y; \

You’ll notice that by using temporary variables inside the statement expression, the arguments to the macro are only evaluated once (as they should be). Unfortunately, because we decided to use a statement expression, we actually sacrificed the generality of the macro. For example, the macro in Figure Two will work on any type of data that the < operator works on, but the macro in Figure Three will only work on integers. This little problem can be fixed by using the typeof operator.

Tricks with Types

The typeof operator lets you defer the specification of a variable’s type until compile time. Many of you are probably already familiar with the sizeof operator that, at compile time, fills in the size of the given type (or type of variable passed to the operator). Like sizeof, typeof is a compile-time operator that can take either a type or a variable. However, rather than providing the size of a type, it actually returns the type of the argument, and this type-information can be used to declare other variables.

Figure Four shows an example of how you can declare variables with the typeof operator. This code declares three variables: x, y, and z. In the end, they are all of type (int*), but to get there, three different techniques were used:

Figure Four: The typeof Operator

int* x;
typeof (int*) y;
typeof (x) z;

  • x is declared in the traditional sense.
  • y is declared using the typeof operator with a type as the argument.
  • z is declared using the typeof operator with a variable, x , as the argument.

Note that the typeof operator can be used in any place where a typedef name can be used. This would include declaring variables and the sizeof operator. These are examples of places where type-information is used during compile time.

In Figure Five the typeof operator is used to generalize the min macro that we defined in Figure Three. Now min can be made to work on any arguments that can be compared with the < operator while simultaneously being safer than a normal macro.

Figure Five: An Improvement MIN Macro that Uses the typeof Operator

#define min(x,y) ({ \
typeof (x) _x = (x); \
typeof (y) _y = (y); \
_x < _y ? _x : _y; \

You might think that we have reintroduced the side-effect problem that was discussed earlier, since it looks like we repeat the evaluation of x and y in the macro. However, this is not the case, because the typeof operator does not actually evaluate the expression! The operator simply checks the type of the expression at compile-time and replaces the operator with that type. Thus, using statement expressions in combination with the typeof operator makes it much easier to implement safer macros.

Local Labels

Now that we’ve gotten macros straightened out, we can move on to non-standard ways of controlling the flow of a program using goto. For those of you who are fond of using labels and goto statements in your code (for breaking out of loops or even for driving your co-workers crazy), statement expressions introduce yet another complication. If your macros define labels in them, then you’ll likely realize pretty soon that if the macro is expanded more than once in a function, the compiler will give you errors about multiple labels with the same name.

gcc provides a solution to this problem by allowing you to declare labels as variables within a statement expression. This can be done by using the special __label__ type. You can jump to these local labels with a goto just as you would be able to jump to any other label. Take a look at Figure Six for an example of a statement expression that uses local labels.

Figure Six: Local Labels in Statement Expressions

#define search(array, target, max) \
({ \
__label__ found; \
int i, value; \
for (i = 0; j < max; i++) \
if (array[i] == target) \
{ value = i; goto found; } \
value = -1; \
found: \
value; \

This macro simply searches for an element in an array and returns its index if it is found or -1 if it is not. Note that when using local labels, you must declare all of your local labels before you declare any other variables in your statement expression.

Labels as Values

Speaking of labels, gcc also makes it possible for you to take the address of a label and save it in a pointer variable. By doing this, you can then dereference this pointer in a goto statement to jump to that label. The address of a label is found by using the unary operator, &&. For example, the following code takes the address of a label, foo, and stores it in ptr:

void* ptr;
ptr = &&foo

You can now jump to the label foo in one of two ways:

goto foo;

is the traditional method, but you can also use the goto as in the following:

goto *ptr;

Any expression of type void* is allowed as the ptr in the above goto statement.

One might use this feature to set up an array of places in the code to which the program needs to jump. By storing a set of labels in an array, you can simply jump to the correct one by indexing the array, which is a fast O(1) operation. Contrast this with the switch statement that usually accomplishes a similar task by doing a linear O(n) search through a list of cases. This does not scale well, as your list of cases grows exceedingly long. The inefficiency of this approach becomes even more apparent if the switch statement is executed frequently.

Another benefit of jump tables is that they can be modified during run-time — something switch will never be able to accommodate. Thus, creating an array of labels can solve a class of problems that do not fit a switch statement very well.

As useful as label pointers may be, there are some serious caveats to consider. Note that you should never use this extension to jump to code in different functions. If you do this, completely unpredictable things may happen in your program. It’s best to avoid this by only assigning label addresses to local variables. This way, once the function finishes, the variables that contain the label addresses will no longer be accessible.

Also, be sure to never pass these addresses as arguments to other functions. Since they are simply (void*)s, they will work for many arguments to different functions, but this is a sure way to shoot yourself in the foot. If you find yourself tempted to do these things, it might be a good time to pause and reconsider your design.

Portability Versus Practicality

Most mainstream compilers provide some extensions to the base language specification to aid programmers in the development process. Of course, by providing these additional features, these compilers simultaneously introduce many portability issues. But if you are sure your code is only to be compiled by one compiler, you may find that these language extensions complement the base language in a way that is very productive.

For more information on C and C++ extensions, check out the gcc info page. You can get this by typing info gcc from a prompt in most standard Linux installations. Enjoy the extensions, and happy hacking!

Other Extensions

There are many other extensions to C and C++ provided by gcc and g++. For example:

  • C++-style comments are allowed in C code.
  • gcc allows the second parameter in the ternary operator (? :) to be omitted (and evaluates to the first parameter in the non-zero case).
  • g++ provides for the use of minimum and maximum operators, <? and >?, so that programmers don’t have to write these very common operations as macros in every single program.
  • gcc even provides an extension for complex number types that allows the traditional operations on complex numbers.

Benjamin Chelf is an author and engineer at CodeSourcery, LLC. He can be reached at chelf@codesourcery.com.

Comments are closed.