Separate Compilation

Keeping a large program (say more than 250 lines) in a single source file (e.g. a .cpp file) has several disadvantages:

1. A minor change requires recompilation of the entire program.

2. Reusing part of the program, a class for example, in another program requires a risky copy and paste operation. The class declaration, all member function implementations, and all other dependencies must be located, copied (don't press the cut button!), and pasted into another file.

3. Several programmers can't work on the program simultaneously.

For these reasons most programs are divided into several source files, each containing logically related declarations. Each source file is compiled separately, producing a file of machine language instructions called an object file (e.g., a .obj file in Windows and DOS):

It's the job of a program called a linker to link a project's object files together into a single executable file. The linker is responsible for associating all references to a name in one object file to the definition of the name, which might be in another object file. This process is called address resolution. Sometimes the definition of a name isn't in any of the project's object files. In this case the linker searches the standard C++ library (libcp.lib), the standard C library (libc.lib), and any special libraries specified by the programmer for the name's definition. (Only the definitions needed by the program are extracted from a library and linked into the program.)

If the name's definition still can't be found, the linker generates an unfriendly error message:

Linking...
main.obj : error LNK2001: unresolved external symbol "int __cdecl sine(int)" (?sine@@YAHH@Z)
Debug/Lab5b.exe : fatal error LNK1120: 1 unresolved externals
Error executing link.exe.
Lab5b.exe - 2 error(s), 0 warning(s)

What are header files for?

Multiple source file programs sound great, but there is a problem. While the compiler is willing to accept that a name can be used in a source file without being defined there (after all, its the linker's job to find the definition), the compiler must at least know the type of the name in order to perform type checking, variant selections, and offset calculations. But the compiler only compiles one file at a time. It knows nothing of the project's other source files; it can't be expected to go searching for a name's type the same way the linker searches for a name's definition. Regrettably, it is the programmer's duty to put type declarations of all names used but not defined in the file.

This job would be worse than it already is if it wasn't for the #include directive. A preprocessor directive is a command beginning with a # symbol that can appear in a source or header file. When a file is compiled, it is first submitted to the C/C++ preprocessor, which executes all directives. (Consult online help for a list of all directives.) Preprocessor directives usually alter the source file in some way:

For example, the #include directive takes a file name for its argument. When executed by the preprocessor, it replaces itself with the entire contents of the file. (Be careful, if the same directive is in the file, then an infinite loop can occur.)

How do we use #include to solve our problem? If a programmer creates a source file called util.cpp containing definitions that may be useful in other source files, the programmer also creates a header file called util.h containing the corresponding type declarations . A second source file, say main.cpp, that uses the definitions contained in util.cpp only needs to include the directive in order to be compiled:

#include "util.h"

Normally, util.h would also include type declarations needed by util.cpp, so util.cpp also includes util.h. Of course util.obj will still need to be available to the linker to create an executable.

What goes into a header file?

C++ makes a distinction between definitions and declarations. Technically, a definition associates a name with a particular item such as a variable, function, or class, while a declaration merely associates a name with a type. We call items that can be named bindables, and an association between a name and a bindable a binding. Declarations are important to the compiler, while definitions are important to the linker. Here are some examples of definitions and declarations:

1. The definition:

int x;

associates the name x with an integer variable, while the declaration:

extern int x;

tells the compiler that x is the name of an integer variable, but not which one.

2. The definition:

double square(double x) { return x * x; }

associates the name square with a function that is parameterized by a double and that returns a double, while the declaration:

double square(double x); // a.k.a. prototype or header

tells the compiler that square is the name of a function that is parameterized by a double and that returns a double, but not which one.

3. The definition:

struct Date { int month, day, year; };

associates the name Date with a class of objects, each containing three integer member variables named month, day, and year, while the declaration:

struct Date; // forward reference

tells the compiler that Date is the name of a class, but not which one.

The terminology is a little confusing because associating a name to a bindable implicitly associates the name to a type, and therefore a definition is also a declaration. In other words, the definition:

int x;

is also a declaration, because it also tells the compiler that x is the name of an integer variable. However, the declaration:

extern int x;

is not a definition, because it doesn't bind the name x to any particular variable. In short, all definitions are declarations, but not vice versa. Let's call declarations that aren't definitions pure declarations.

The One Definition Rule

The One-Definition Rule (ODR) states that a definition may only occur once in a C++ program, but a pure declaration can occur multiple times as long as the occurrences don't contradict each other. Therefore, pure declarations are often placed in header files, which may then be included multiple times in a program, but definitions should not be placed in header files.

Exceptions to the One Definition Rule

However, some types of definitions are important to the compiler, too, and so, like pure declarations, these must be placed in header files that get included in every source file where they are needed. To accommodate these types of definitions, there are three exceptions to the One Definition Rule: types, templates, and inline functions may be multiply defined in a C++ program[1], provided three conditions are met:

1. They don't occur multiple times in the same source file.

2. Their occurances are token-for-token identical.

3. The meanings of their tokens remain the same in each occurance.

Here are three ways of violating these conditions. (The example is borrowed from [STR].) See if you can spot them:

Example 1:

// file1.cpp
struct Date { int d, m, y; };
struct Date { int d, m, y; };

Example 2:

// file1.cpp
struct Date { int d, m, y; };

// file2.cpp
struct Date { int d, m, year; };

Example 3:

// file1.cpp
typedef int INT;
struct Date { INT d, m, y; };

// file2.cpp
typedef char INT;
struct Date { INT d, m, y; };

Preventing Multiple Declarations in a Single File

The second and third conditions are easy to live with, but insuring that a type definition will not occur multiple times in the same source file can be tricky. Suppose the definition of Date is contained in date.h, which is included in util.h:

// date.h
struct Date { int d, m, y; };
// etc.

// util.h
#include "date.h"
// etc.

Not knowing that util.h already includes date.h, the unwary author of main.cpp includes both files:

// main.cpp
#include "date.h"
#include "util.h"
// etc.

Unfortunately, this places two occurrences of the definition of Date in main.cpp, thus violating the condition that two type definitions can't occur in the same file.

To prevent this problem, header files can be conditionally included in a file. The usual condition is that a particular identifier hasn't been defined by the preprocessor, yet. The #ifndef /#endif directives are used to bracket the declarations that are conditionally included in this way. The identifier is usually defined immediately after the #ifndef directive using the #define directive. Thus the declarations are included exactly once. To prevent naming conflicts, the name of the identifier is formed from the name of the header file, replacing the period by an underscore and replacing lower case letters by upper case letters. Here's the new, safe version of date.h:

// date.h
#ifndef DATE_H
#define DATE_H
struct Date { int d, m, y; };
// etc.
#endif

Since conditional inclusion is possible, why not ignore the One Definition Rule and place all definitions in header files? Well, conditional inclusion only prevents a header file from being included multiple times in a single source file. If our program has multiple source files, then it's still possible that a definition placed in a header file will occur multiple times within the program, which will cause a linker error.

We could restrict ourselves to a single source file, but a clever linker only extracts and links those definitions that are actually used into the executable file. If we include all definitions into a single source file, then they will appear in the executable whether they are used or not.

[1] Why are definitions of types, templates, and inline functions needed by the compiler?