Out-of-Bounds Access
Indexing Past the End of Arrays
What happens if you try to use an index that's bigger than the size of your array?
What do you think happens?
Well, in languages like Python and Java, you'd get an error…
True! But remember C++'s rallying cries: Power to the People! and Zero Overhead!
I've got a bad feeling about this…
Real-World Undefined Behavior
To understand what happens when you access an array out of bounds, we need to talk about undefined behavior, but to get a sense of what that means, let's start with a real-world example.
Suppose you're driving a car, and you've been given this map, with the strict instructions to “stay on the road”:
You're driving along, and you see a short-cut that looks like it will save you some time. You're tempted to take it, but you remember the instructions: “stay on the road”. So you keep driving and arrive safely at your destination. But what would have happened if you'd tried to take a short-cut? You don't know! No one has said what will happen, so the consequences are not defined.
In languages like Python or Java, they like to define what happens when your program breaks the rules. So, in this situation they might say “we have installed guardrails on the road so if you try to drive off the road, you'll just bounce back onto the road”. Or they might have installed guardrails such that if your car touches the rails it will shut off and you'll need to be towed.
But installing guardrails is time consuming, and expensive. Most roads don't have guardrails, you're just expected to drive properly and stay on the road. If you don't, you'll face the consequences, whatever they are. That's what it means to have undefined behavior. And that's the approach taken by C++—it expects you to follow the rules, and if you don't, it doesn't promise to protect you from the consequences, or even say what the consequences will be.
Perhaps the reason the road curves is that there is a lake in the way…
Or perhaps there's a cliff… But even if there was a dirt road through private property, perhaps a farm, and you drove through it fine three times, the fourth time you might get shot at by the farmer, or hit a closed gate, or get stuck in the mud.
Breaking the rules and getting away with it once doesn't mean you'll get away with it again. And that's what it means to have undefined behavior.
Breaking Array-Access Rules
In Java and Python, if you try to access an array at an index that doesn't exist, there is a predictable outcome (e.g., in Java, an ArrayIndexOutOfBoundsException
will be thrown). For that predictable outcome to happen, there has to be some code (that you didn't write!) that double checks whether the index is in bounds and then either gets the item or throws the exception accordingly.
That little bit of time spent checking the bounds of the array is overhead!
It's an extra cost that the programmer didn't ask for and probably doesn't need most of the time.
And you know by now how overhead is treated in C++'s design.
So how does C++ avoid checking the bounds of the array?
C++'s attitude is that it's not going to babysit you an make sure you're doing everything right. You have rules to follow, like “stay inside the bounds of the array”, and if you break them, who knows what the consequences will be, so don't break the rules.
The C++ standard simply says “all array accesses must be in bounds”. If you break that rule, the consequences are not defined.
Most implementations of C++ make no special effort to check whether you're breaking the rules because that would take valuable time. They just do whatever is easiest and fastest, and just assuming that the program is following the rules. And, like driving off the road, if you break the rules, maybe sometimes you'll get away with it, sometimes subtle damage will be done but you won't notice right now, and sometimes something catastrophic will happen.
Undefined behavior means basically “no rules” because your program is “out of control”; there are no limits on what might happen—from nothing at all, to subtle data corruption that's hard to detect, to something really catastrophic. Generally speaking, systems don't go out of their way to cause catastrophic behavior. We could say that losing control of a car is also undefined behavior: maybe you'll just spin out safely at the side of the road, but maybe something far worse will happen.
So it's important to recognize that you can't know what will happen when behavior is undefined. The practical upshot is that you have to write code that avoids situations that could cause undefined behavior, and you should certainly never write a program that knowingly performs undefined behavior under the assumption that will be benign because “nothing bad seemed to happen” when you tried it a few times.
Consider this code snippet:
int x = 444;
int a[3]{1, 2, 3};
int y = 555;
cout << a[3] << endl;
Meh, can it really be the case that anything could happen?
Yes! Let's demonstrate.
Typical Behavior
So if I write a program for this class and it has a bug, it might erase everything on my hard drive??
Certainly no one deliberately writes a compiler that says “If the programmer makes a silly mistake, erase their hard drive!”.
So for seriously bad things to happen, you need some kind of chain of cause and effect that isn't likely to arise in the kinds of code you write for this class.
Although “undefined behavior” means that any behavior could technically occur, we can usually make educated guesses about what might happen on a typical system—usually by thinking about what would plausibly happen if there were no guardrails at runtime to actively prevent rule-breaking by your program.
Although you shouldn't rely on any particular outcome, knowing the likely outcomes of indexing errors could help you diagnose those errors in your own code.
Here's our snippet again:
int x = 444;
int a[3]{1, 2, 3};
int y = 555;
cout << a[3] << endl;
A reason it might be
444
rather than555
is that on this machine, the stack grows downwards from high memory addresses to low ones.Ah. Makes sense. In CS 70's model, we'll stick to counting upwards though!
And, as we said, on a real machine, there's no guarantee that the compiler lays out a function's variables in any particular order.
Key Ideas:
- Remember that
a[3]
means*(a +
3
)
, the memory address of the fourth element, if it exited. - So the least-effort outcome is to interpret whatever bits are at that address as an
int
and return that value! - Remember that variables in memory on a real machine need not be arranged in the same order as our memory model!
Segmentation Faults
So… does this mean that I can just access memory outside of my array?
Yeah, usually! The standard doesn't guarantee you can, but, in practice, that's the typical behavior.
But… What if I do something horrible?? What if I mess something up for another program? Or the operating system???
Ah! Good news: you don't need to worry about that. On a modern computer operating systtem like Windows, macOS or Linux, each program is confined to its own assigned region of memory.
If your program tries to access memory outside of its allocated region, the operating system will immediately shut the program down. This error is called a segmentation fault, or “segfault” for short.
A common cause for segmentation faults is accessing an array wayyyyyy out of bounds.
Segmentation fault errors can be very cryptic. Your program just stops (“crashes”) and you get a message that basically just says “Segmentation fault”—no line number, no context, nothing. We'll eventually introduce you to a tool that can help to give you more information for debugging these errors.
At the very least you now know that if you get a segmentation fault, you should be suspicious of your array indices!
Here's a slightly different snippet:
int x = 444;
int a[3]{1, 2, 3};
int y = 555;
cout << a[1000000000] << endl;
Summary
- Accessing an array out of bounds is “undefined behavior”.
- Literally no outcome are ruled out, as far as the C++ standard is concerned.
- You should never write a program that relies on a particular outcome when the behavior is undefined.
- Typical C++ compilers produce code that does things the fastest and easiest way, without any special safety checks while the program is running.
- Thus we shouldn't be surprised if it finds the place in memory where the item would have been if the array had been that big, and interpreting whatever bits are there as the appropriate type.
- Accessing an array out of bounds can cause strange behavior because the compiler is assuming that you're following the rules.
- On our systems, accessing memory that is way out of bounds can cause your program to crash.
- A modern operating system will not allow your program to access memory outside of its allotted region.
- This error is called a “Segmentation fault” (segfault for short).
(When logged in, completion status appears here.)