Multi-File Projects

Real world programming projects are rarely contained in a single source code file. There's a reason for this: they're big. When a project becomes more that a few hundred lines of code, it becomes difficult to control.

The first rule of dealing with large projects is easy: break large projects up into multiple files.

There are two reasons behind this rule: 1) to save programmer time and 2) to save project time.

When a source code file becomes large, a programmer can waste a lot of time just moving around in the file. There are often many functions and variables that simply do not apply to the specific problem that a programmer is tackling at the moment. A programmer will deal conceptually with a program at many different levels. Having to scroll or search through a lot of code is anywhere from annoying to confusing.

Even more important is saving project time. Building any program occurs in two phases, the compilation phase and the linking phase. Linking object code files into an executable file is fairly quick. But compiling source code into object code can be fairly involved. Compiling a large source code project from scratch can take several hours (many companies perform nightly builds on their projects) while linking the same project can take just a couple of minutes.

The big problem of saving time comes in code maintenance. If a bug occurs in a large monolithic source code file that you maintain, then you may have to wait a long time to build your program before being able to test it, even for a simple fix. (And your boss may not be too keen on you leaving work early. "Honest, boss, I'm building the project right now!")

When a project is split among several source code files, only the source files that change need to be recompiled. The resulting new object code files are then quickly relinked with the other object code files of the project to create the new executable.

Splitting a project up into several files also lets more than one person work effectively on the project. When two or more people work on their own copies of a single large file, it can be difficult to merge together all the results at the end even if they are using version control software. When the project is split up into pieces, each person can work on their own section, and the results can be fairly easily integrated at the end.

Splitting a Project

There are as many ways of splitting a project up as there are programmers, but some relatively standard approaches have emerged over time.

In object oriented programming, a good rule of thumb is to set up a separate source code file to implement each class. This is not a hard and fast rule. Several small, related classes might be implemented in the same source code file, but this is the exception rather than the rule. Some finer details about splitting along class lines are given below.

When the compiler compiles a source code file, it does so in complete isolation from other source code files. Each source code file is independent and compiled separately. However, source files will frequently need access to programmer-defined data types such as classes or structs and will need to call functions and member functions that are defined in other source files or as part of the C++ standard library. In order for the project to build correctly, it must meet the following criteria:

Header Files

While it is certainly possible to copy and paste a class definition or a function prototype into each source file that makes use of it, doing so is usually a recipe for consistency errors to creep into the project. If the class definition or function prototype needs to be changed, changing it in every file where it is used can be time-consuming and error prone. Header files are a way to address this issue.

A header file is a file of C++ code that typically ends with the extension .h. This simply a convention, not a rule enforced by the compiler. The extension is not coded when including standard library header files.

A simple header file may contain a list of function prototypes. Those are the functions that are available to the world of the program. When a source code file includes a header file, the compiler knows the names and types of variables and functions, and can check for proper matches for function arguments. It can generate the proper code to call the function, even though it doesn't know what the function does.

Header files are also wonderful for containing class definitions. A very common object-oriented technique places each class definition in its own header file. The implementations of the class member functions are placed in their own separate source code file. Anyone who wants to use the class can simply #include the header file to find out what is available.

The #include Directive

The #include compiler directive copies the contents of a header file into a source file.

There are actually two ways to include header files. One places angle brackets around the file name. This is the style that you're probably used to; it is used to include header files that are part of the standard library. The other way includes the file name (potentially preceded by a directory path) in quotes. This second style should be used with any header file that you create. For example:

// Library header files
#include <iostream>
#include <iomanip>
#include <cstring>
#include <cmath>

// Header files created by the programmer
#include "Student.h"
#include "Course.h"

At the risk of stating the obvious, #include every header file that is needed by a source code file..

If you use a function or class from the standard library, make sure that you include its header file. If you're not sure what header file you need to include, use a search engine to search for "C++" and the function or class name. That should find a link to the documentation page for that function or class on reference sites like cplusplus.com or cppreference.com, which will list the necessary header file.

When it comes to the header files that you create, err on the side of including too much rather than too little.

#include "Student.h"

class Course
{
private:

    char name[31]{"None"};
    char course_number[11]{"None"};
    Student class_list[45];
    int num_students{0};

    void sort_class_list();

public:

    // Constructors
    Course() = default;
    Course(const char*, const char*);
    ...
};

The question now is, "In Course.cpp, I need to #include "Course.h", but do I need to #include "Student.h"? It's already included, right?"

If you know for certain that the Student class is essentially part of the Course class, then it is okay to include only the one file. In practice, it's often difficult to know (short of looking at the header files) what has to be included and what does not. When in doubt, include. The program should suffer no ill effects from including something again that has already been included.

Header Guards

Being able to include a file as often as needed does not occur automatically. An example will illustrate the problem.

The header files for two classes and a source file that uses them are shown are shown below:

// Alpha.h
class Alpha
{
private:
    int x;
    
public:
    ...
};

// Beta.h

#include "Alpha.h"

class Beta
{
private:
    int y;
    Alpha a;

public:
    ...
};

// headerTest.cpp
#include "Alpha.h"
#include "Beta.h"

int main()
{
    Alpha a;
    Beta b;

    return 0;
}

Running g++ -Wall -Werror -std=c++11 -o headerTest headerTest.cpp results in the following error:

In file included from Beta.h:1,
from headerTest.cpp:2:
Alpha.h:2: error: redefinition of `class Alpha'
Alpha.h:2: error: previous definition of `class Alpha'

The class Alpha is being defined twice, once when it is included in headerTest.cpp and once when it is included into Beta.h which is included in headerTest.cpp.

One way of solving this problem is to remove the line #include "Alpha.h" from headerTest.cpp. But this means that a programmer must be aware of the relationship between Alpha and Beta. If an instance of Alpha is to be used, Alpha.h should be included. Except when a Beta instance is used, then Alpha should not be included even though it is used in the source code file. Not fun.

A better way to solve this problem is through the use of header guards. Header guards are little pieces of code that protect the contents of a header file from being included more than once.

Header guards are implemented through the use of preprocessor directives. The C/C++ preprocessor directives all start with the # character. You are already familiar with some (#include, #define). The preprocessor performs some simple textual replacements on a file before handing it off to the compiler.

Some of the preprocessor directives are conditional. The #ifdef SYMBOL directive is true when SYMBOL has been defined in the code seen so far. If the directive is true, then the statements that come between the #ifdef and an #endif directive later on will be used in the program. If the #ifdef is false, then the statements from that point on will be ignored and not sent to the compiler.

(A quick way to comment out large sections of code is to put a #if 0 at the beginning of the code to be commented and a #endif at the end.)

Another useful preprocessor directive is #ifndef SYMBOL. This directive is true if the symbol has not been defined. Like other conditional directives, if the condition is true then the statements between the #ifndef and an #endif will be used in the program.

Header guards are implemented by using three preprocessor directives in a header file. Two are placed at the beginning of the file, before any pertinent code. The last is placed at the end of the file. The first header guard line is of the form:

#ifndef MY_SYMBOL_H

#define MY_SYMBOL_H

#endif /* MY_SYMBOL_H */

The symbol used is not crucial, but it must be unique. It is traditional to use all capital letters for the symbol. Only letters, numbers and the underscore character can be used in the symbols. No other punctuation is allowed. A very common symbol is to use the name of the header file, converting the .h suffix to a _H.

The purpose of this symbol is to serve as a marker. If the symbol is defined then this section of code has been seen before and should not be processed again. If the symbol has not been created, then the code it is associated with has not been seen.

// Alpha.h
#ifndef ALPHA_H
#define ALPHA_H

class Alpha
{
private:
    int x;

public:
    ...
};

#endif /* ALPHA_H */

// Beta.h
#ifndef BETA_H
#define BETA_H

#include "Alpha.h"

class Beta
{
private:
    int y;
    Alpha a;

public:
    ...
};

#endif /* BETA_H */

// headerTest.cpp
#include "Alpha.h"
#include "Beta.h"

int main()
{
    Alpha a;
    Beta b;

    return 0;
}

Some analysis is in order. The first time one of the unique symbols in a header guard is encountered, the #ifndef statement is true. The symbol is not defined. Because of that, all of the code between the #ifndef and #endif is included and sent to the compiler. If the symbol were defined, the code between the directives would be ignored.

After getting past the #ifndef the header guard symbol is immediately defined. (No value is given. It's not needed.) This insures that the first time through the #ifndef will be the only time that the symbol is undefined. The code being protected will only be seen once, no matter how many times the header file is included.

The C style comment after the #endif directive is not mandatory, but it is considered good style.

Mult-File Projects and IDEs

All of the description above also applies to IDEs like Dev-C++ and Xcode. IDEs generally do not require you to write a makefile; instead, you typically need to create a new "project" and then add your C++ source files (but not your header files) to the project. You then build the project as a whole. The usual C++ build process takes place (possibly even using g++ to perform the work), but the details are obscured to some degree.

The exact details of creating a project and adding source files to it will be slightly different in every IDE. Consult your IDE's documentation or perform a web search for "how to create a project" followed by your IDE's name.