Pointers and arrays

In school, I had a teacher who once told me:

"No matter how experienced you are, nobody truly understands pointers completely."

No statement could be truer. In standard C, a pointer is a variable whose value points to a location in memory. The problem with standard C is that this location in memory is not associated with a particular type. Instead, the pointer type itself defines the type of memory the pointer is pointing to, as in the following example:

int main(void)
{
int i;
int *p = &i;
}

// > gcc scratchpad.c; ./a.out

In the previous example, we created an integer, and then created a pointer and pointed it at the previously-defined integer. We could, however, do the following:

int main(void)
{
int i;
void *p = &i;

int *int_p = p;
float *float_p = p;
}

// > gcc scratchpad.c; ./a.out

In this program, we create a pointer to an integer, but we define the pointer type as void *, which tells the compiler we are creating a pointer with no type. We then create two additional pointers—a pointer to an integer, and a pointer to a floating point number. Both of these additional pointers are initialized using the void * pointer we created earlier. 

The problem with this example is that the standard C compiler is performing automatic type casting, changing a void * into both an integer pointer and a floating point number pointer. If both of these pointers were used, corruption would occur in a couple of ways:

  • Depending on the architecture, a buffer overflow could occur, as an integer could be larger than a float and vice versa. It depends on the CPU being used; a topic that will be discussed in more detail in Chapter 3System Types for C and C++
  • Under the hood, an integer and a floating point number are stored differently in the same memory, meaning any attempt to set one value would corrupt the other. 

Thankfully, modern C compilers have flags that are capable of detecting this type of type casting error, but these warnings must be enabled as they are not on by default, as shown previously. 

The obvious issue with pointers is not just that they can point to anything in memory and redefine that memory's meaning, but that they can also take on a null value. In other words, pointers are considered optional. They either optionally contain a valid value and point to memory, or they are null.

For this reason, pointers should not be used until their value is determined to be valid, as follows:

#include <stdio.h>

int main(void)
{
int i = 42;
int *p = &i;

if (p) {
printf("The answer is: %d\n", *p);
}
}

// > gcc scratchpad.c; ./a.out
// The answer is: 42

In the previous example, we create a pointer to an integer that is initialized with the location of a previously-defined integer with an initial value of 42. We check to make sure p is not a null pointer, and then output its value to stdout

The addition of the if() statement is not only cumbersome—it isn't performant. For this reason, most programmers would leave out the if() statement, knowing that, in this example, p is never a null pointer.

The problem with this is, at some point, the programmer could add code to this simple example that contradicts this assumption, while simultaneously forgetting to add the if() statement, resulting in code that has the potential to generate a hard-to-find segmentation fault. 

As will be shown in the next section, the C++ standard addresses this issue by introducing the notion of a reference, which is a non-optional pointer, meaning it is a pointer that must always point to a valid, typed, memory location. To address this issue in standard C, null pointer checks are usually (although not always) checked by public-facing APIs. Private APIs typically do not check for null pointers to improve performance, making the assumption, that so long as the public-facing API cannot accept a null pointer, it's likely the private API will never see an invalid pointer. 

Standard C arrays are similar to pointers. The only difference is that a C array leverages a syntax capable of indexing into the memory pointed to by a pointer, as in the following example:

#include <stdio.h>

int main(void)
{
int i[2] = {42, 43};
int *p = i;

if (p) {
// method #1
printf("The answer is: %d and %d\n", i[0], p[0]);
printf("The answer is: %d and %d\n", i[1], p[1]);

// method #2
printf("The answer is: %d and %d\n", *(i + 0), *(p + 0));
printf("The answer is: %d and %d\n", *(i + 1), *(p + 1));
}
}

// > gcc scratchpad.c; ./a.out
// The answer is: 42 and 42
// The answer is: 43 and 43
// The answer is: 42 and 42
// The answer is: 43 and 43

In the previous example, we create an array of integers with 2 elements initialized to the values 42 and 43. We then create a pointer that points to the array. Note that the & is no longer needed. This is because the array is a pointer, thus, we are simply setting one pointer to the value of another (instead of having to extract a pointer from an existing memory location).

Finally, we print the value of each element in the array using both the array itself and the pointer to the array using pointer arithmetic. 

As will be discussed in Chapter 4, C++, RAII, and the GSL Refresher, there is little difference between an array and a pointer. Both perform what is known as pointer arithmetic when an attempt is being made to access an element in an array. 

With respect to system programming, pointers are used extensively. Examples include the following:

  • Since standard C doesn't contain the notion of a reference as C++ does, system APIs that must be passed by a reference because they are too large to be passed by a value, or must be modified by the API, must be passed by a pointer, resulting in the heavy use of pointers when making system calls.
  • System programming often involves interacting with pointers to a location in memory, designed to define the layout of that memory. Pointers provide a convenient way to accomplish this.