Linking C++ applications

As in C, C++ applications typically start from a main() function with the same signatures that C already provides. Also, as in C programs, the actual entry point of the code is actually the _start function.

Unlike in C, however, C++ is far more complicated, including a lot more code for a simple example. To demonstrate this, let's look at a simple Hello World\n example:

#include <iostream>

int main(void)
{
std::cout << "Hello World\n";
}

// > g++ scratchpad.cpp; ./a.out
// Hello World

First and foremost, the C++ application example is slightly longer than the equivalent C example from the previous section:

> gcc scratchpad.c -o c_example
> g++ scratchpad.cpp -o cpp_example
> stat -c "%s %n" *
8352 c_example
8768 cpp_example

If we look at the symbols in our example, we get the following:

> nm -gC cpp_example
U __cxa_atexit@@GLIBC_2.2.5
w __cxa_finalize@@GLIBC_2.2.5
00000000000008f4 T _fini
0000000000000688 T _init
00000000000007fa T main
00000000000006f0 T _start
U std::ios_base::Init::Init()@@GLIBCXX_3.4
U std::ios_base::Init::~Init()@@GLIBCXX_3.4
0000000000201020 B std::cout@@GLIBCXX_3.4
U std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)@@GLIBCXX_3.4

...

As previously stated, our program contains a main() function and a _start() function. The _start() function is the actual entry point of the application, while the main() function is called by the _start() function after initialization has completed.

The _init() and _fini() functions are responsible for global construction and destruction. In the case of our example, the _init() function creates the code needed by the C++ library to support std::cout, while the _fini() function is responsible for destroying these global objects. To do this, the global objects register with the __cxa_atexit() function, and are eventually destroyed using the __cxa_finalize() function.

The rest of the symbols make up the code for std::cout, including references to ios_base{} and basic_ostream{}

The important thing to note here is that, as in C, there is a lot of code that executes both before and after the main() function, and using global objects in C++ only adds to the complexity of starting and stopping your application.

In the preceding example, we use the _C option to demangle our function names. Let's look at the same output with this option:

> nm -gC cpp_example
U __cxa_atexit@@GLIBC_2.2.5
w __cxa_finalize@@GLIBC_2.2.5
00000000000008f4 T _fini
0000000000000688 T _init
00000000000007fa T main
00000000000006f0 T _start
U _ZNSt8ios_base4InitC1Ev@@GLIBCXX_3.4
U _ZNSt8ios_base4InitD1Ev@@GLIBCXX_3.4
0000000000201020 B _ZSt4cout@@GLIBCXX_3.4
U _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@@GLIBCXX_3.4

...

As shown, some of these functions are still readable, while others are not. Specifically, the C++ specification dictates that certain support functions are linked using C linkage, preventing mangling. In our example, this includes the __cxa_xxx() functions, _init(), _fini()main(), and _start().

The C++ library functions that support std::cout, however, are managed with an almost unreadable syntax. On most POSIX-compliant systems, these mangled names can be demangled using the C++filt command, as follows:

> c++filt _ZSt4cout
std::cout

These names are mangled because they contain the entire function signature in their name, including the arguments and specializations (for example, the noexcept keyword). To demonstrate this, let's create two function overloads:

void test(void) {}
void test(bool b) {}

int main(void)
{
test();
test(false);
}

// > g++ scratchpad.cpp; ./a.out

In the previous example, we created two functions with the same name, but with different function signatures, a process known as function overloading, which is specific to C++.

Now let's look at the symbols in our test application:

> nm -g a.out
...

0000000000000601 T _Z4testb
00000000000005fa T _Z4testv

There are a couple of reasons why function names are mangled in C++:

  • Encoding function arguments in the function's name means functions can be overloaded, and the compiler and the linker will know which function does what. Without name mangling, two functions with the same name but different arguments would look identical to the linker, and errors would occur. 
  • By encoding this type of information in the function name, the linker is able to identify whether a function for a library was compiled using a different signature. Without this information, it would be possible for the linker to link, for example, a library compiled with a different signature (and therefore a different implementation) to the same function name, which would lead to a hard-to-find error, and likely corruption. 

The biggest issue with C++ name mangling is that small changes to a public-facing API result in a library no longer being able to link with already-existing code.

There are many ways to overcome this problem, but, in general, it's simply important to understand that C++ encodes a lot of information about how you write your code in a function's name, making it imperative that public-facing APIs do not change unless a version change is expected.