Floating – point numbers

When system programming, floating point numbers are rarely used, but we will briefly discuss them here for reference. Floating point numbers increase the size of the possible value that can be stored by reducing the accuracy. For example, with a floating point number, it is possible to store a number that represents 1.79769e+308, which is simply not possible with an integer value, even with a long long int. To accomplish this, however, it is not possible to subtract this value by 1 and see a difference in the number's value, and the floating point number cannot represent such a large value while still maintaining the same granularity as an integer value. Another benefit of floating point numbers is their ability to represent sub-integer numbers, which is useful when dealing with more complicated, mathematical calculations (a task that is rarely needed for system programming, as most kernels don't work with floating point numbers to prevent floating point errors from occurring within the kernel, ultimately resulting in a lack of system calls that take floating point values). 

There are mainly three different types of floating point numbers—float, double, and long double. For example, consider the following code:

#include <iostream>

int main(void)
{
auto num_bytes = sizeof(float);
auto min = std::numeric_limits<float>().min();
auto max = std::numeric_limits<float>().max();

std::cout << "num bytes: " << num_bytes << '\n';
std::cout << "min value: " << min << '\n';
std::cout << "max value: " << max << '\n';
}

// > g++ scratchpad.cpp; ./a.out
// num bytes: 4
// min value: 1.17549e-38
// max value: 3.40282e+38

In the previous example, we leverage std::numeric_limits to examine the float type, which on an Intel 64 bit CPU is a 4 byte value. The double is as follows:

#include <iostream>

int main(void)
{
auto num_bytes = sizeof(double);
auto min = std::numeric_limits<double>().min();
auto max = std::numeric_limits<double>().max();

std::cout << "num bytes: " << num_bytes << '\n';
std::cout << "min value: " << min << '\n';
std::cout << "max value: " << max << '\n';
}

// > g++ scratchpad.cpp; ./a.out
// num bytes: 8
// min value: 2.22507e-308
// max value: 1.79769e+308

With the long double, the code is as follows:

#include <iostream>

int main(void)
{
auto num_bytes = sizeof(long double);
auto min = std::numeric_limits<long double>().min();
auto max = std::numeric_limits<long double>().max();

std::cout << "num bytes: " << num_bytes << '\n';
std::cout << "min value: " << min << '\n';
std::cout << "max value: " << max << '\n';
}

// > g++ scratchpad.cpp; ./a.out
// num bytes: 16
// min value: 3.3621e-4932
// max value: 1.18973e+4932

As shown in the previous code, on an Intel 64 bit CPU, the long double is a 16 byte value (or 128 bits), which can store an absolutely massive number.