书名：Hands-On System Programming with C++
作者名：Dr. Rian Quinn
本章字数：965字
更新时间：2021-07-02 14:42:36

Character types

The most basic type in C and C++ is the following character type:

#include <iostream>

int main(void)
{
    char c = 0x42;
    std::cout << c << '\n';
}

// > g++ scratchpad.cpp; ./a.out
// B

A char is an integer type that, on most platforms, is 8 bits in size, and must be capable of taking on the value range of [0, 255] for unsigned, and [-127, 127] for signed. The difference between a char and the other integer types is that a char has a special meaning, corresponding with the American Standard Code for Information Interchange (ASCII). In the preceding example, the uppercase letter B is represented by the 8-bit value 0x42. It should be noted that although a char can be used to simply represent an 8-bit integer type, its default meaning is a character type; that's why it has a special meaning. For example, consider the following code:

#include <iostream>

int main(void)
{
    int i = 0x42;
    char c = 0x42;

    std::cout << i << '\n';
    std::cout << c << '\n';
}

// > g++ scratchpad.cpp; ./a.out
// 66
// B

In the previous example, we represented the same integer type, 0x42, using both an int (to be explained later) and a char. These two values are, however, output to stdout in two different ways. The integer is output as an integer while, using the same APIs, the char is output as its ASCII representation. In addition, arrays of char types are considered to be an ASCII string type in both C and C++, which also has a special meaning. The following code shows this:

#include <iostream>

int main(void)
{
    const char *str = "Hello World\n";
    std::cout << str;
}

// > g++ scratchpad.cpp; ./a.out
// Hello World

From the preceding example, we understand the following. We define an ASCII string using a char pointer (an unbounded array type would also work in this case); std::cout understands how to handle this type by default, and a char array has a special meaning. Changing the array type to an int would not compile, as the compiler would not know how to convert the string to an array on integers, and std::cout would not know, by default, how to handle the array of integers, even though, on some platforms, an int and a char might actually be the same type.

Like a bool and short int, the character type is not always the most efficient type to use when representing an 8-bit integer, and as alluded to in the previous code, on some platforms, it is possible for a char to actually be larger than 8 bits, a topic that will be discussed in further detail when we discuss integers.

To further investigate the char type, as well as the other types being discussed in this section, let's leverage the std::numeric_limits{} class. This class provides a simple wrapper around limits.h, which provides us with a means to query how a type is implemented on a given platform in real time using a collection of static member functions.

For example, consider the following code:

#include <iostream>

int main(void)
{
    auto num_bytes_signed = sizeof(signed char);
    auto min_signed = std::numeric_limits<signed char>().min();
    auto max_signed = std::numeric_limits<signed char>().max();

    auto num_bytes_unsigned = sizeof(unsigned char);
    auto min_unsigned = std::numeric_limits<unsigned char>().min();
    auto max_unsigned = std::numeric_limits<unsigned char>().max();

    std::cout << "num bytes (signed): " << num_bytes_signed << '\n';
    std::cout << "min value (signed): " << +min_signed << '\n';
    std::cout << "max value (signed): " << +max_signed << '\n';

    std::cout << '\n';

    std::cout << "num bytes (unsigned): " << num_bytes_unsigned << '\n';
    std::cout << "min value (unsigned): " << +min_unsigned << '\n';
    std::cout << "max value (unsigned): " << +max_unsigned << '\n';
}

// > g++ scratchpad.cpp; ./a.out
// num bytes (signed): 1
// min value (signed): -128
// max value (signed): 127

// num bytes (unsigned): 1
// min value (unsigned): 0
// max value (unsigned): 255

In the preceding example, we leverage std::numeric_limits{} to tell us the min and max value for both a signed and unsigned char (it should be noted that all examples in this book were performed on a standard Intel 64-bit CPU, and it is assumed that these same examples can, in fact, be executed on different platforms for which the values being returned might be different). The std::numeric_limits{} class can provide real-time information about a type, including the following:

Signed or unsigned
Conversion limits, such as rounding and the total number of digits needed to represent the type
Min and max information

As shown in the preceding example, a char on a 64-bit Intel CPU is 1 byte in size (that is, 8 bits), and takes on the values [0, 255] for an unsigned char and [-127, 127] for a signed char, as stated by the specification. Let's look at a wide char or wchar_t:

#include <iostream>

int main(void)
{
    auto num_bytes_signed = sizeof(signed wchar_t);
    auto min_signed = std::numeric_limits<signed wchar_t>().min();
    auto max_signed = std::numeric_limits<signed wchar_t>().max();

    auto num_bytes_unsigned = sizeof(unsigned wchar_t);
    auto min_unsigned = std::numeric_limits<unsigned wchar_t>().min();
    auto max_unsigned = std::numeric_limits<unsigned wchar_t>().max();

    std::cout << "num bytes (signed): " << num_bytes_signed << '\n';
    std::cout << "min value (signed): " << +min_signed << '\n';
    std::cout << "max value (signed): " << +max_signed << '\n';

    std::cout << '\n';

    std::cout << "num bytes (unsigned): " << num_bytes_unsigned << '\n';
    std::cout << "min value (unsigned): " << +min_unsigned << '\n';
    std::cout << "max value (unsigned): " << +max_unsigned << '\n';
}

// > g++ scratchpad.cpp; ./a.out
// num bytes (signed): 4
// min value (signed): -2147483648
// max value (signed): 2147483647

// num bytes (unsigned): 4
// min value (unsigned): 0
// max value (unsigned): 4294967295

A wchar_t represents Unicode characters and its size depends on the operating system. On most Unix-based systems, a wchar_t is 4 bytes, and can represent a UTF-32 character type, as shown in the previous example, while on Windows, a wchar_t is 2 bytes in size, and can represent a UTF-16 character type. Executing the previous example on either of these operating systems will result in a different output.

This is extremely important, and this issue defines the fundamental theme of this entire chapter; the default types that C and C++ provide are different depending on the CPU architecture, the operating system, and in some cases, if the application is running in user space or in the kernel (for example, when a 32-bit application is executing on a 64-bit kernel). Never assume, while system programming, that when interfacing with a system call, that your application's definition of a specific type is the same as the type the API assumes. Quite often, this assumption will prove to be invalid.