Skip to main content

The C Preprocessor

Part of an ongoing series of essays tentatively entitled Don’t embarrass me, Don’t embarrass yourself: Notes on thinking in C and Unix.

As we saw in an earlier essay, the first step the C compiler does is to preprocess the source file. Preprocessing generally involves simple textual manipulation. There are a variety of things that the preprocessor does.

  • The preprocessor strips out any comments in the file. Why? Because comments are generally for human beings [1].
  • The preprocessor replaces any line of the form #include <FILENAME> with the contents of the named file, using a standard list of directories to look for the file [2].
  • The preprocessor replaces any line of the form #include "PATH_TO_FILE" with the contents of the named file, treating the path relative to the current directory [3].
  • The preprocessor replaces any constants defined on the command line with -DCONSTANT=VALUE with the corresponding value.
  • The preprocessor replaces any constants defined in the file with #define CONSTANT VALUE with the corresponding value.
  • The preprocessor handles conditional sections, which we will cover a bit below.
  • The preprocessor handles macros, which we will cover in subsequent readings.
  • The preprocessor inserts comments that help the remaining compiler steps identify where in the original file they are so that they can provide appropriate error messages.
  • The preprocessor does few more things that you don’t need to know right now.
  • The preprocessor does many more things that I either never knew about or forgot about.

Let’s look at the first few in turn.

First, let’s watch the preprocessor strip comments.

$ cat example1.c 
/**
 * example1.c
 *   An example to illustrate comment stripping.
 *
 * 
 */

int
main (int argc[], char *argv)
{
  return 0;
} // main

$ cc -E example1.c
# 1 "example1.c"
# 1 ""
# 1 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "" 2
# 1 "example1.c"







int
main (int argc[], char *argv)
{
  return 0;
}

Not very interesting, I know. You’ll note that even though the preprocessor has stripped the comments, it has left blank lines so that other programs count appropriately.

Now, let’s try working with an included file.

$ cat example2.c 
#include "include2.c"

int
main (int argc, char *argv[])
{
  return result ();
} // main

$ cat include2.c 
int
result (void)
{
  return 0;
} // result

$ cc -E example2.c
# 1 "example2.c"
# 1 ""
# 1 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "" 2
# 1 "example2.c"
# 1 "include2.c" 1
int
result (void)
{
  return 0;
}
# 2 "example2.c" 2

int
main (int argc, char *argv[])
{
  return result ();
}

You may note that I’ve included a .c file, rather than a .h file, and that the included file contains code. C doesn’t care what’s in the included file; it just grabs the contents.

What happens if we try to include the same file twice?

$ cat example3.c
#include "include3.c"
#include "include3.c"

int
main (int argc, char *argv[])
{
  return result3 ();
} // main

$ cat include3.c 
int
result3 (void)
{
  return 0;
} // result3

$ cc -E example3.c
# 1 "example3.c"
# 1 ""
# 1 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "" 2
# 1 "example3.c"
# 1 "include3.c" 1
int
result3 (void)
{
  return 0;
}
# 2 "example3.c" 2
# 1 "include3.c" 1
int
result3 (void)
{
  return 0;
}
# 3 "example3.c" 2

int
main (int argc, char *argv[])
{
  return result3 ();
}

As we can see in the preprocessed source, we get the contents of the file included twice. Is that a problem? Let’s see.

$ cc example3.c
In file included from example3.c:2:0:
include3.c:2:1: error: redefinition of ‘result3’
 result3 (void)
 ^
In file included from example3.c:1:0:
include3.c:2:1: note: previous definition of ‘result3’ was here
 result3 (void)
 ^

Yup, that’s definitely a problem. Now, you maybe thinking to yourself Sam, no one would ever include the same file twice. But it turns out that that’s not true. Sometimes, you will include two files, and each will include the same other file [4]. There are also a few times that programmers intentionally include the same file, but they then arrange to avoid overlaps [5]. How do we handle the inadvertent double include? We’ll get to that topic soon.

Next up are constants. You may recall that one way to define constants is on the command line, with -DCONSTANT=VALUE [6]. Let’s explore that approach.

$ cat example4.c 
int
main (int argc, char *argv)
{
  return ZERO;
} // example4

$ cc -E example4.c
# 1 "example4.c"
# 1 ""
# 1 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "" 2
# 1 "example4.c"
int
main (int argc, char *argv)
{
  return ZERO;
}

$ cc -DZERO=0 -E example4.c
# 1 "example4.c"
# 1 ""
# 1 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "" 2
# 1 "example4.c"
int
main (int argc, char *argv)
{
  return 0;
}

$ cc -DZERO=1 -E example4.c
# 1 "example4.c"
# 1 ""
# 1 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "" 2
# 1 "example4.c"
int
main (int argc, char *argv)
{
  return 1;
}

Wasn’t that fun? As you can see, defining constants with command-line flags lets us quickly reconfigure our program [7]. However, we more often define the constants directly in the file, with #define.

$ cat example5.c 
#define ZERO 0

int
main (int argc, char *argv)
{
  return ZERO;
} // example5

$ cc -E example5.c 
# 1 "example5.c"
# 1 ""
# 1 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "" 2
# 1 "example5.c"


int
main (int argc, char *argv)
{
  return 0;
}

What happens if we try to redefine ZERO on the command line? Let’s see.

$ cc -DZERO=7 -E example5.c
# 1 "example5.c"
# 1 ""
# 1 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "" 2
# 1 "example5.c"
example5.c:1:0: warning: "ZERO" redefined
 #define ZERO 0
 ^
:0:0: note: this is the location of the previous definition


int
main (int argc, char *argv)
{
  return 0;
}

As one would hope, the preprocessor issues a warning. It appears that it uses the most recent definition.

Of course, it seems a bit silly to define ZERO as 0. But we will more frequently use constants which represent values that are constant throughout the program, but which we may want to vary when compiling the program. Here’s one such example.

$ cat example6.c 
#define ARRAY_LEN 16

int
main (int argc, char *argv)
{
  int values[ARRAY_LEN];
  int sum = 0;
  int i;

  for (i = 0; i < ARRAY_LEN; i++)
    sum += values[i];

  return sum;
} // example6

$ cc -E example6.c 
# 1 "example6.c"
# 1 ""
# 1 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "" 2
# 1 "example6.c"


int
main (int argc, char *argv)
{
  int values[17];
  int sum = 0;
  int i;

  for (i = 0; i < 16; i++)
    sum += values[i];

  return sum;
}

What are preprocessor conditionals? The simplest and most commonly used conditionals simply check whether or not a name is defined (using -D or #define and make different choices if it is. For those cases, you use #ifdef NAME at the beginning of a block of code and #endif at the end. If you want an else clause, you use #else.

Let’s consider a simple example. While we are developing our program, we always want to seed the random number generator with the same value. However, when we deploy, we most likely want to seed the random number generator in a less predictable way [7]. In the example below, we simply print out a random number. Note that I have elided the cruft that comes from including <stdio.h>.

$ cat example7.c
#include 
int 
main (int argc, char *argv[])
{
#ifdef TESTING
  srand (0);
#else
  srand (time (0));
#endif
  printf ("%d\n", rand ());
  return 0;
} // example7

$ cc -DTESTING -E example7.c
# 1 "example7.c"
# 1 ""
# 1 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "" 2
# 1 "example7.c"
# 1 "/usr/include/stdio.h" 1 3 4
# ...
# 943 "/usr/include/stdio.h" 3 4

# 2 "example7.c" 2
int
main (int argc, char *argv[])
{

  srand (0);



  printf ("%d\n", rand ());
  return 0;
}

$ cc -DTESTING example7.c -o ex7test

$ ./ex7test
1804289383

$ ./ex7test
1804289383

$ ./ex7test
1804289383

$ cc -E example7.c
# 1 "example7.c"
# 1 ""
# 1 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "" 2
# 1 "example7.c"
# 1 "/usr/include/stdio.h" 1 3 4
# ...
# 943 "/usr/include/stdio.h" 3 4

# 2 "example7.c" 2
int
main (int argc, char *argv[])
{



  srand (time (0));

  printf ("%d\n", rand ());
  return 0;
}

$ cc example7.c -o ex7

$ ./ex7
128869637

$ ./ex7
1972234548

$ ./ex7
292607164

$ ./ex7
2130073865

Conditionals are one way we normally avoid repeated includes. Here’s the earlier example, using a sensible #ifndef wrapper [8,9].

$ cat example8.c 
#include "include8.c"
#include "include8.c"

int
main (int argc, char *argv[])
{
  return result8 ();
} // main from example8

$ cat include8.c 
#ifndef _INCLUDE_8_
#define _INCLUDE_8_
int
result8 (void)
{
  return 0;
} // result8
#endif // _INCLUDE_8_

$ cc -E example8.c
# 1 "example8.c"
# 1 ""
# 1 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "" 2
# 1 "example8.c"
# 1 "include8.c" 1


int
result8 (void)
{
  return 0;
}
# 2 "example8.c" 2


int
main (int argc, char *argv[])
{
  return result8 ();
}

$ cc example8.c

Isn’t that much nicer? Now you know why most header files start with an #ifndef.

We can also use a similar approach to define values only when they are not defined on the command line [10].

$ cat example9.c 
#ifndef ARRAY_LEN
#define ARRAY_LEN 16
#endif

int
main (int argc, char *argv)
{
  int values[ARRAY_LEN];
  int sum = 0;
  int i;

  for (i = 0; i < ARRAY_LEN; i++)
    sum += values[i];

  return sum;
} // example9

$ cc -E example9.c
# 1 "example9.c"
# 1 ""
# 1 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "" 2
# 1 "example9.c"




int
main (int argc, char *argv)
{
  int values[17];
  int sum = 0;
  int i;

  for (i = 0; i < 16; i++)
    sum += values[i];

  return sum;
}

$ cc -DARRAY_LEN=128 -E example9.c
# 1 "example9.c"
# 1 ""
# 1 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "" 2
# 1 "example9.c"




int
main (int argc, char *argv)
{
  int values[129];
  int sum = 0;
  int i;

  for (i = 0; i < 128; i++)
    sum += values[i];

  return sum;
}

As I hope these examples suggest, the preprocessor gives you a lot of power to easily customize your programs. You’ll find that these aspects of the preprocessor, along with the wonder of macros, will give you great power and flexibility.


[1] There are, of course, some exceptions. Some comments can serve as compiler hints in some languages.

[2] I assume that there’s a shell variable for that, but I can’t recall. I also assume that /usr/include and /usr/local/include are on the list. You can add elements to the list with -Idir. The GCC manual doesn’t list a shell variable, so maybe there isn’t one.

[3] Even if you execute the compilation command from another directory, the path is relative to the directory in which the compiled file is. The compiler will also search relative to each directory in the standard include list.

[4] I’m too lazy to work out the example, but you know what I mean. If you don’t, drop me a message and I’ll write the example.

[5] That’s a subject for a future essay.

[6] At times, we may write -D'CONSTANT=VALUE' if we are worried about the shell doing something strange with the name or the value.

[7] I’m using time (0) as the seed. That’s not ideal. But it varies enough that I’m comfortable with it as an example.

[8] #ifndef is like the inverse of #ifdef. It means if not defined.

[9] I feel way too pleased that example 8 is a variant of example 3, given that 3 and 8 look similar. I’m almost as pleased that this endnote ended up as number 8.

[10] I’m also happy that the variant of number 6 is number 9 [11].

[11] Unfortunately, when I hear number 9, I think of the phrase turn me on dead man [12].

[12] If you did not immediately understand that comment, I’m sure that a Web search would help you. I’m not sure what will help me stop making stupid associations.


Version 1.0 released 2017-03-24.

Version 1.1 of 2021-04-29.