Using GCC

Here is a typical command to generate an executable from C source files using GCC:

$gcc -g -Wall -I/usr/include/libxml2/libxml -lxml2 main.c aux.c -o tut_prog

This command tells GCC to compile the source files main.c and aux.c, and produce a binary called 'tut_prog' (this stands for 'tutorial program'). The various switches have the following meaning :

-g: tells GCC to include debug information into the binary.
-Wall: Warning all : print every warning. This switch is used by the C compiler only.
-Idir dir: Look for included header files (like in #include <myheader.h>) in directory dir. This switch is used by the C preprocessor only.
-llib lib: Link to library lib; here libxml2, used by the linker.

In fact, GCC does not do the compilation itself. It gets all arguments from the user and calls other programs passing them the needed arguments with some default one to do the four stages involved in the compilation.

Figure 2-1 GCC compilation stage

2.1.1. Preprocessor
2.1.2. Compiler
2.1.3. Assembler
2.1.4. Linker

Each C source file goes through the preprocessor, named cpp. In this stage, all include files and macros are expanded to get preprocessed C source code. The resulting file .i is rarely generated but it could be interesting to see how complex macros are expanded. You can do this by calling cpp directly or using the -E switch of GCC. The later option is better as it allows to run the preprocessor with all default options.

In this stage each file is compiled into assembler code. The compiler depends on the language of the source file, C here. It is a complex program because there is not a one to one correspondence between C instructions and assembler instructions. By example, requesting the fastest or the smallest program will generate different sequences of assembler instructions. The assembler is the language of your computer, the most common one is called x86. The output is a, human readable, assembler source file ending with '.s'. Like the preprocessor output, the assembler code is not normally written on the hard disk. You can stop the compilation at this stage to look at it by running GCC with the -S switch.

In this stage each file is assembled: the assembler code is transformed to an object file with .o extension. It is much easier than the compilation as each assembler instruction corresponds to an unique code. The object file contains additional information for debugging or linking the program. The object file is a binary format, called ELF on recent Linux machine, you need to use special program to look in it like objdump. It is possible to write directly assembler code and assemble it using the assembler ,as, or GCC if your source file has a .s extension. These object files are commonly written on your hard disk because it depends only of theirs corresponding C source file (with all files included). If you modify only one source file, you need to regenerate only the corresponding object file. You can stop at this stage, without linking by using -c switch with GCC.

This stage is used to combine all the object files and libraries into one executable file. The result is a binary file in a format close to the object file, on Linux it is even the same format.

Using GCC

2.1.1. Preprocessor

2.1.2. Compiler

2.1.3. Assembler

2.1.4. Linker

About