CS 70

Out-of-Bounds Access

Indexing Past the End of Arrays

  • Duck speaking

    What happens if you try to use an index that's bigger than the size of your array?

  • LHS Cow speaking

    What do you think happens?

  • Cat speaking

    Well, in languages like Python and Java, you'd get an error…

  • LHS Cow speaking

    True! But remember C++'s rallying cries: Power to the People! and Zero Overhead!

  • Hedgehog speaking

    I've got a bad feeling about this…

Real-World Undefined Behavior

To understand what happens when you access an array out of bounds, we need to talk about undefined behavior, but to get a sense of what that means, let's start with a real-world example.

Suppose you're driving a car, and you've been given this map, with the strict instructions to “stay on the road”:

Map of a road with a loop but a possible off-road short-cut

You're driving along, and you see a short-cut that looks like it will save you some time. You're tempted to take it, but you remember the instructions: “stay on the road”. So you keep driving and arrive safely at your destination. But what would have happened if you'd tried to take a short-cut? You don't know! No one has said what will happen, so the consequences are not defined.

In languages like Python or Java, they like to define what happens when your program breaks the rules. So, in this situation they might say “we have installed guardrails on the road so if you try to drive off the road, you'll just bounce back onto the road”. Or they might have installed guardrails such that if your car touches the rails it will shut off and you'll need to be towed.

Map of a road with a loop and guardrails

But installing guardrails is time consuming, and expensive. Most roads don't have guardrails, you're just expected to drive properly and stay on the road. If you don't, you'll face the consequences, whatever they are. That's what it means to have undefined behavior. And that's the approach taken by C++—it expects you to follow the rules, and if you don't, it doesn't promise to protect you from the consequences, or even say what the consequences will be.

Let's imagine someone drives straight, rather than taking the U-shaped path on the map—either deliberately or because they've fallen asleep at the wheel. Describe three possible outcomes...

An outcome where things work out more-or-less okay despite breaking the rules:

An outcome where things work out badly but not catastrophically:

An outcome where things work out catastrophically:

With no additional information, can you know which of these outcomes will happen?

Perhaps the reason the road curves is that there is a lake in the way…

Map of a road with a loop and a lake

Or perhaps there's a cliff… But even if there was a dirt road through private property, perhaps a farm, and you drove through it fine three times, the fourth time you might get shot at by the farmer, or hit a closed gate, or get stuck in the mud.

Breaking the rules and getting away with it once doesn't mean you'll get away with it again. And that's what it means to have undefined behavior.

Breaking Array-Access Rules

In Java and Python, if you try to access an array at an index that doesn't exist, there is a predictable outcome (e.g., in Java, an ArrayIndexOutOfBoundsException will be thrown). For that predictable outcome to happen, there has to be some code (that you didn't write!) that double checks whether the index is in bounds and then either gets the item or throws the exception accordingly.

  • LHS Cow speaking

    That little bit of time spent checking the bounds of the array is overhead!

  • RHS Cow speaking

    It's an extra cost that the programmer didn't ask for and probably doesn't need most of the time.

  • LHS Cow speaking

    And you know by now how overhead is treated in C++'s design.

  • Duck speaking

    So how does C++ avoid checking the bounds of the array?

  • LHS Cow speaking

    C++'s attitude is that it's not going to babysit you an make sure you're doing everything right. You have rules to follow, like “stay inside the bounds of the array”, and if you break them, who knows what the consequences will be, so don't break the rules.

The C++ standard simply says “all array accesses must be in bounds”. If you break that rule, the consequences are not defined.

Most implementations of C++ make no special effort to check whether you're breaking the rules because that would take valuable time. They just do whatever is easiest and fastest, and just assuming that the program is following the rules. And, like driving off the road, if you break the rules, maybe sometimes you'll get away with it, sometimes subtle damage will be done but you won't notice right now, and sometimes something catastrophic will happen.

Undefined behavior means basically “no rules” because your program is “out of control”; there are no limits on what might happen—from nothing at all, to subtle data corruption that's hard to detect, to something really catastrophic. Generally speaking, systems don't go out of their way to cause catastrophic behavior. We could say that losing control of a car is also undefined behavior: maybe you'll just spin out safely at the side of the road, but maybe something far worse will happen.

So it's important to recognize that you can't know what will happen when behavior is undefined. The practical upshot is that you have to write code that avoids situations that could cause undefined behavior, and you should certainly never write a program that knowingly performs undefined behavior under the assumption that will be benign because “nothing bad seemed to happen” when you tried it a few times.

Consider this code snippet:

int x = 444;
int a[3]{1, 2, 3};
int y = 555;
cout << a[3] << endl;

Undefined Behavior?

What will definitely happen on the last line?

  • Goat speaking

    Meh, can it really be the case that anything could happen?

  • LHS Cow speaking

    Yes! Let's demonstrate.

Typical Behavior

  • Hedgehog speaking

    So if I write a program for this class and it has a bug, it might erase everything on my hard drive??

  • LHS Cow speaking

    Certainly no one deliberately writes a compiler that says “If the programmer makes a silly mistake, erase their hard drive!”.

  • RHS Cow speaking

    So for seriously bad things to happen, you need some kind of chain of cause and effect that isn't likely to arise in the kinds of code you write for this class.

  • LHS Cow speaking

    Although “undefined behavior” means that any behavior could technically occur, we can usually make educated guesses about what might happen on a typical system—usually by thinking about what would plausibly happen if there were no guardrails at runtime to actively prevent rule-breaking by your program.

  • RHS Cow speaking

    Although you shouldn't rely on any particular outcome, knowing the likely outcomes of indexing errors could help you diagnose those errors in your own code.

Here's our snippet again:

int x = 444;
int a[3]{1, 2, 3};
int y = 555;
cout << a[3] << endl;

If there are no guardrails to prevent out-of-bounds access to arrays at runtime, what do you think this code might print?

  • Bjarne speaking

    A reason it might be 444 rather than 555 is that on this machine, the stack grows downwards from high memory addresses to low ones.

  • LHS Cow speaking

    Ah. Makes sense. In CS 70's model, we'll stick to counting upwards though!

  • RHS Cow speaking

    And, as we said, on a real machine, there's no guarantee that the compiler lays out a function's variables in any particular order.

Key Ideas:

  • Remember that a[3] means *(a +3), the memory address of the fourth element, if it exited.
  • So the least-effort outcome is to interpret whatever bits are at that address as an int and return that value!
  • Remember that variables in memory on a real machine need not be arranged in the same order as our memory model!

What will probably happen on the last line?

Segmentation Faults

  • Duck speaking

    So… does this mean that I can just access memory outside of my array?

  • LHS Cow speaking

    Yeah, usually! The standard doesn't guarantee you can, but, in practice, that's the typical behavior.

  • Hedgehog speaking

    But… What if I do something horrible?? What if I mess something up for another program? Or the operating system???

  • LHS Cow speaking

    Ah! Good news: you don't need to worry about that. On a modern computer operating systtem like Windows, macOS or Linux, each program is confined to its own assigned region of memory.

If your program tries to access memory outside of its allocated region, the operating system will immediately shut the program down. This error is called a segmentation fault, or “segfault” for short.

A common cause for segmentation faults is accessing an array wayyyyyy out of bounds.

Segmentation fault errors can be very cryptic. Your program just stops (“crashes”) and you get a message that basically just says “Segmentation fault”—no line number, no context, nothing. We'll eventually introduce you to a tool that can help to give you more information for debugging these errors.

At the very least you now know that if you get a segmentation fault, you should be suspicious of your array indices!

Here's a slightly different snippet:

int x = 444;
int a[3]{1, 2, 3};
int y = 555;
cout << a[1000000000] << endl;

What will probably happen on the last line?

Summary

  • Accessing an array out of bounds is “undefined behavior”.
    • Literally no outcome are ruled out, as far as the C++ standard is concerned.
    • You should never write a program that relies on a particular outcome when the behavior is undefined.
  • Typical C++ compilers produce code that does things the fastest and easiest way, without any special safety checks while the program is running.
    • Thus we shouldn't be surprised if it finds the place in memory where the item would have been if the array had been that big, and interpreting whatever bits are there as the appropriate type.
    • Accessing an array out of bounds can cause strange behavior because the compiler is assuming that you're following the rules.
  • On our systems, accessing memory that is way out of bounds can cause your program to crash.
    • A modern operating system will not allow your program to access memory outside of its allotted region.
    • This error is called a “Segmentation fault” (segfault for short).

(When logged in, completion status appears here.)