CS 70

Numeric Types—Fundamentals

Reminder: All Data Is Bits

Inside the computer, memory just stores binary data (zeros and ones). If your computer has 16 GiB of RAM, it has 17,179,869,184 (\( 16 \times 2^{30} \)) distinct memory locations, each holding one byte of data (8 bits). When a program runs, it has some fraction of the computer's memory to work with, and it gets to decide how to use that memory. If we have 1000 bytes of RAM, it might hold:

  • A 10 px × 25 px color (RGBA) bitmap image.
  • A Python array list holding 24 numbers
  • A C array containing 250 32-bit integers.
  • The first 118 words read from the dictionary.

What the contents of memory actually represent is up to the program running on the machine using that memory (and the language that program is written in).

  • Duck speaking

    So if all values are a bunch of zeros and ones, then all numbers must be stored the same way in C++, right?

  • LHS Cow speaking

    No, not quite...

Number Types in C++

C++ defines a bunch of different types for variables that store numbers. Those types differ based on

  • How many bits they use.
  • What they use those bits to represent.

Fundamental Integer Types in C and C++

C++'s integer types are based on its C heritage.

Type Minimum Size (Bits) Minimum Size (Bytes)
char 8 1
short int 16 2
int 16 2
long int 32 4
long long int 64 8
  • RHS Cow speaking

    You're allowed to omit the word int in the long and short variations, so you can just say short to mean short int.

Notice that these sizes are only minimums. C and C++ allow any given computer system to adopt larger values, so long as they follow the rule that, when it comes the sizes of these types,

  • sizeof(char)sizeof(short)sizeof(int)sizeof(long)sizeof(long long)

In practice, there are only four “memory models” that have caught on. On modern 64-bit systems, you're likely to only see LP64 (Mac and Linux) or LLP64 (Windows). The table below shows how many bits each type uses in these different models:

Type C++ standard LP32 ILP32 LLP64 LP64
char at least 8 8 8 8 8
short int at least 16 16 16 16 16
int at least 16 16 32 32 32
long int at least 32 32 32 32 64
long long int at least 64 64 64 64 64
  • Cat speaking

    Am I missing something? ILP32 and LLP64 are the same!

  • LHS Cow speaking

    Yes, they do have the same sizes for integers, but the underlying system is a 32-bit or 64-bit system (which influences other things).

  • RHS Cow speaking

    It also tells you why Microsoft picked LLP64 for 64-bit code on Windows, to change as little as possible!

  • Goat speaking

    Seems like a lot of different types to keep track of, and lots of variation between systems.

  • LHS Cow speaking

    Yes. It's true. And people do sometimes pick the wrong one, or make incorrect assumptions (e.g., assume that because a long is 64-bit on Linux, it must be the same on Windows).

  • Goat speaking

    So why do it!??! When I learned to code in Python, we just used numbers and never worried about there being different kinds.

  • Python speaking

    Actually Python's numbers aren't perfect. In Python 12345678901234567890+1-1 == 12345678901234567890 returns True, but 12345678901234567890+0.1-0.1 == 12345678901234567890 returns False!

  • LHS Cow speaking

    And, in Python, the number 12345678901234567890 requires 36 bytes—that's four times more than the 8 bytes that a 64-bit value would take in C or C++.

  • Python speaking

    Actually, characters in Python are even worse—they need to be stored as single-character strings and require 50 bytes each, 50 times more than a char in C or C++.

  • LHS Cow speaking

    We can see C++'s choice as being about efficiency. You can save memory by choosing a type that is sized appropriately for the range of values it will store.

  • RHS Cow speaking

    And designers for a particular system can choose sizes that are optimal for their particular hardware. A C-development system for embedded systems (that might run something like a TV remote control or your toaster) probably doesn't need 64-bit integers.

  • Duck speaking

    Will there ever be a long long long int?

  • LHS Cow speaking

    The C++ standards people seem to have promised there won't be.

  • RHS Cow speaking

    But some compilers provide a type __int128_t which basically is exactly that (but it's nonstandard).

Which takes up more memory, a short int or a long int?

Signed and Unsigned Integers

So far, we've only talked about how many bits the fundamental types might have, not the range of values they can store. When we have \( n \) bits, we have \( 2^n \) distinct bit patterns, but we have a choice for how to use them.

  • unsigned: We could use all of them for non-negative values; or
  • signed: We could use half of them for non-negative values and half of them for negative values.

To be more specific, let's imagine using a 16-bit short int. We have two choices:

  • unsigned short int: Represent \( 2^{16} \) distinct non-negative values from \( 0\ldots{}65535 \).
  • signed short int: Represent \( 2^{15} \) distinct non-negative values from \( 0\ldots{}32767 \) and \( 2^{15} \) distinct negative values, from \( -32768\ldots{}\!\!-1 \), for a total range of \( -32768\ldots{}32767 \).

For all the int types (i.e., everything except char), the default is signed so we it's redundant to say signed. If we want the unsigned option, we have to ask for it.

  • Duck speaking

    You said except char—what about char? Is it signed or unsigned?

  • LHS Cow speaking

    Gah. signed char is signed, unsigned char is unsigned, and char is a distinct type that might either be signed or unsigned, depending on the system.

  • RHS Cow speaking

    But if you just use char for characters, not tiny integers, you'll be fine.

Deeper Dive: Number Representations

If you get the general idea, you can skip this deeper dive, but if it all seems a bit odd, or if you want a slightly deeper understanding, keep reading.

As a smaller example, let's imagine a computer has 4 bits it can use to store an integer, an unsigned integer, or a floating-point number (we'll discuss floating point in more detail in the next section, but it helps to include it in the table here). Here's a plausible way it could assign bit patterns to values:

Bit Pattern Unsigned Integer Signed Integer Floating Point
0000 0 0 0
0001 1 1 0.5
0010 2 2 1
0011 3 3 2
0100 4 4 0.25
0101 5 5 0.75
0110 6 6 1.5
0111 7 7 3
1000 8 -8 -0
1001 9 -7 -0.5
1010 10 -6 -1
1011 11 -5 -2
1100 12 -4 -0.25
1101 13 -3 -0.75
1110 14 -2 -1.5
1111 15 -1 -3

We can make a few observations from this table:

  • Each type can represent some values that the other ones can't. For example,

    • Unsigned ints can't represent negative numbers or decimals.
    • In signed integers, making space for negative numbers takes away space from positive numbers.
    • Floats can represent decimals, but there are be some integer values they can't represent (given that there are only 16 bit patterns here, making space for some decimal numbers means we couldn't have 16 integer values as well).
    • Our signed numbers can represent a negative number (-8) that doesn't have a positive equivalent!
  • Floating point numbers seem to be ordered strangely and don't give all numbers the same amount of decimal places, only about the same number of significant digits.

    • You don't need to know this, but for the curious, our 4-bit floats have 2 significand bits, and 2 exponent bits. You'll learn more about floating point if/when you take CS 105.

These properties are true of our simple 4-bit example, but they're also true in general. These properties catch programmers out a lot, leading to various bugs.

  • Hedgehog speaking

    Uh oh. I can put -32768 in a 16-bit signed short int but not +32768?

  • LHS Cow speaking

    That's right. And if you had -32768 and negated it, it'd overflow!

  • Goat speaking

    Ugh! Why?? Why do that?

  • LHS Cow speaking

    It's because we need to have zero in there somewhere. That's why we said non-negative rather than positive.

  • RHS Cow speaking

    In the early days of computers, people tried to work out different ways of dealing with signed numbers and zero. There are other options, like a positive and negative zero to try to make things symmetric. But each approach has downsides.

  • LHS Cow speaking

    And for a very long time C and C++ did not take a side on the right answer. But in the end, finally, last year, after about 50 years, in the C++ 20 standard, they finally decided to mandate the option just about everyone had picked from the beginning, the above asymmetric approach (known as two's complement.)

Which of these two has a lower minimum value?

Which of these types has a larger maximum value?

Other Integer Types

C++ has some other integer types that can be useful. These include:

  • bool—can only store 0 or 1 (but it's unspecified how large it is!)
  • size_t—an unsigned integer type of some sort with enough bits to represent the size of any object
  • ptrdiff_t—a signed integer type of some sort with enough bits to represent the difference between any two sizes (e.g., the offset of one thing compared to another).

And

  • int8_t—a signed int guaranteed to be 8 bits
  • int16_t—a signed int guaranteed to be 16 bits
  • int32_t—a signed int guaranteed to be 32 bits
  • int64_t—a signed int guaranteed to be 64 bits
  • uint8_t—an unsigned int guaranteed to be 8 bits
  • uint16_t—an unsigned int guaranteed to be 16 bits
  • uint32_t—an unsigned int guaranteed to be 32 bits
  • uint64_t—an unsigned int guaranteed to be 64 bits

With the exception of bool, these types are just alternative names for some other type on the system (e.g., on the (imaginary) Trend-Tastic system, int32_t is another name for int, but on the (fictitious) History-O-Matic system, it's another name for long).

  • RHS Cow speaking

    To use size_t and ptrdiff_t you need to have #include <cstddef> at the top of the file.

  • LHS Cow speaking

    And to use those specific-size integer types, you need #include <cstdint> at the top of the file.

Floating-Point Numbers

Integer types have no way of storing, say, 4.2 or 70.7 or 3.1415926. Floating-point types can store fractional values like these. They do so using a computer equivalent of scientific notation.

For example, if we asked most people to calculate the number of ways to shuffle a deck of cards (i.e., 52! = 52 × 51 × &cdots × 2 × 1), they're more likely to say 8.0658175170943877 × 1067, rather than the exact value, which is 80,658,175,170,943,877,224,137,984,000,000,000,000,000,000,000,000,000,000,000.

The scientific notation variant is clearly less exact, but in many contexts having the answer to a reasonable number of significant digits is usually good enough. And, of course, some numbers, like the results of sines, cosines, and logarithms can be irrational numbers with no finite numerical representation anyway.

In scientific notation, there are three parts:

  1. The Base (e.g., the highlighted part of 8.0658175170943877 × 1067, namely 10), also known as the radix.
  2. The Significand (e.g., the first number in 8.0658175170943877 × 1067).
  3. The Exponent (e.g., the last number in 8.0658175170943877 × 1067, namely 67)

For floating point in a computer, the radix is usually base-two (a.k.a. "binary–floating point"), but the same concepts apply. In a binary–floating- point number, with \( n \) bits, some bits are used for the exponent and the rest are used for the significand.

Like integers, C and C++ support different floating point types that use different numbers of bits, and there is a similar size hierarchy to the one we saw for integer types:

  • sizeof(float)sizeof(double)sizeof(long double)

The C and C++ standards don't make many other firm promises about what else might be true about floating point types, but they recommend that float and double follow the IEEE-754/IEC-559 standard for floating point. When that is the case (which it usually is), we can say the following:

IEEE 754 Type C++ Type Total Bits Significand Bits Exponent Bits
Half Precision 16 11 5
Single Precision float 32 24 8
Double Precision double 64 53 11
Quadruple Precision 128 113 15
  • Duck speaking

    What about long double? Is it IEEE-764 Quadruple Precision?

  • LHS Cow speaking

    That'd be the gold standard, and on some systems yes. On other systems it's identical to a double. One other machines it's something else. Whee.

  • Cat speaking

    Are there unsigned floating point numbers?

  • LHS Cow speaking

    No. Thankfully not.

  • RHS Cow speaking

    But there is plenty of other weirdness, including positive and negative zero.

  • LHS Cow speaking

    Gah.

Deeper Dive: Floating-Point Strangeness!

Floating point types are used a lot, but they're a bit strange. You'll learn more about them if you take CS 105, but since not everyone takes CS 105 but almost every programmer works with floating point at some point, here's a little bit more.

If you look back at the 4-bit example in the table in the previous “deeper dive” section,, think about adding \( 1 \) (i.e., \( 1 \times 2^0 \)) to \( 0.75 \) (i.e., \( 1.5 \times 2^{-1} \) ). You might think the answer should be \( 1.75 \) (i.e., \( 1.75 \times 2^0 \) ), but you won't find \( 1.75 \) in the table above—so there isn't enough precision in our representation for \( 1.75 \), it can only represent \( 1.5 \) (i.e., \( 1.5 \times 2^0 \) ) or \( 2 \) (i.e., \( 1 \times 2^1 \) ). Thus adding one and subtracting one may not get you back where you started!!

Similarly, our 4-bit float representation didn't have exact values for \( 1/3 \) or \( 1/5 \): we can only pick a closest approximation. The same is usually true for 32-bit and 64-bit floats, because these numbers have have no exact representation in binary, they have recurring digits that we can only approximate with a fixed number of binary digits.

In real programs, there are numerous coding errors come from not really grasping the strangeness of floating point numbers, but you'll dive deeper into that if/when you take CS 105.

Finally, a fun fact that is true of our 4-bit representation and true of most floating point types. There are the same number of unique values in the range \( [0.0, 1.0 ) \) as there are values greater than or equal to 1.

Review

Consider the following statements about numeric types in C++:

(When logged in, completion status appears here.)