C++ Implementation: insert()
Here's our encoding:
class IntTree {
public: ...
private:
// Encoding for a tree node
struct Node {
int key_;
Node* left_ = nullptr; // if not specified, default to nullptr
Node* right_ = nullptr; // if not specified, default to nullptr
};
// The tree object owns all the nodes, and has a pointer to the root node.
Node* root_;
};
Wait. I don't see why you needed a
Node
struct
at all. Couldn't you just have put all theNode
data members in theIntTree
class directly?No. For example, if we said every
IntTree
has akey_
, we wouldn't be able to deal with emptyIntTree
s.Overall, everything would be worse and more confusing. Don't try to avoid having a distinct type for your tree nodes.
insert()
We'll have a public member function that inserts a given key into the structure:
void insert(int k);
How will we implement this operation using our encoding?
Remember, here is our BST-INSERT pseudocode:
BST-INSERT(treenode, x): if there isn't a treenode (i.e., this subtree is empty): // base case replace empty subtree with new tree node containing x as its value (and empty subtrees) else if x < treenode.value: BST-INSERT(treenode.leftSubtree, x) // recurse on smaller case (left subtree) else: // x >= treenode.value BST-INSERT(treenode.rightSubtree, x) // recurse on smaller case (right subtree)
I'm confused. I don't think I can use this pseudocode to write the
insert()
member function.Ooooh. Good point!
Our BST-INSERT pseudocode was written for our conceptual encoding of the tree data structure, where we only thought about nodes. Our C++ encoding is a little different. Let's look back at the diagram we when we introduced the encoding:
In one sentence, the issue is
BST-INSERT is all about the stuff in blue circles in our diagram, not the brown square, but the brown square is the IntTree
object on which the insert()
member function operates.
The interface given by our BST-INSERT pseudocode was written in a language-neutral way, but it was also an interface that was well-suited to recursion. It took a treenode argument, and recursive calls could pass in smaller subtrees for that argument until we hit the base case.
In contrast, in our C++ encoding, our tree's (private) Node
s are wrapped up inside an IntTree
class.
Our class needs an insert()
function, but that function itself can't be recursive—there's only one IntTree
object to work with, so there are no other smaller IntTree
s to recurse on. So this insert()
function can't be BST-INSERT.
We do still implement a version of BST-INSERT, but as a separate (private) helper function—one that works on Node
s. Its interface will look just like the interface of BST-INSERT, but instead of being called directly by the user, this function will be called by our insert()
function, passing in the root_
of the tree as a starting point.
So, whereas our BST-INSERT pseudocode was a recursive function and was also an appropriate interface for inserting into the tree, in our C++ version, these aspects are separate and different—the interface that users call is insert
, and it hands off the work to a separate private helper function whose interface is well suited to recursing over our Node
s, but only the insert()
function ever sees that interface, users don't.
We can declare insert()
and insertHelper()
in our class definition as follows:
class IntTree {
public:
void insert(int k);
...
private:
struct Node {
...
};
Node* root_;
void insertHelper(Node*& tree, int k);
};
Whoah! Whoah! What is that
Node*&
thing?You can use our inside-out rule for type-reading to find out! It's a reference to a pointer to a
Node
.Why would you do this to meeeee??
It's all because of the part of the pseudocode where we said “replace empty subtree with new tree node”—we need to be able to update trees and pointers. Let's go over it…
For us, a Node*
represents a subtree (by pointing to its root node).
In the base case, where the subtree is empty, we need to replace an empty subtree with a new tree consisting of a single node. So we need to change a pointer that was previously nullptr
to one that points to our new node.
But we want the original pointer (wherever that is) to get updated. If we passed a Node*
and then assigned that to a new value, we'd just be changing a pointer variable in our function and nowhere else. So we need to pass a reference, a Node*&
, so we can reassign the pointer to a new node.
Probably some code will help make everything make sense…
void IntTree::insert(int k) {
if (!exists(k)) { // Don't insert keys already in the tree
insertHelper(root_, k);
}
}
void IntTree::insertHelper(Node*& tree, int k) {
if (tree == nullptr) {
tree = new Node{k, nullptr, nullptr};
// ^-- because tree is a reference to a pointer, this will
// change the value of the pointer that was passed in!
}
else if (k < tree->key_) {
insertHelper(tree->left_, k);
}
else {
insertHelper(tree->right_, k);
}
}
Because we pass tree
by reference, when we run tree = new Node{k, nullptr, nullptr}
, we will actually be changing either tree->left_
, tree->right_
, or root_
, depending on where insertHelper
was called. If we'd declared tree as just a Node*
, that assignment wouldn't have any lasting effect.
Here's a video example of walking through this code using our memory model. (For simplicity, we'll leave out that call to exists
.)
A Couple of Tips
First, if, by chance, you happen to find yourself implementing a BST in C++ in the near future, we recommend that you draw lots of pictures!
Visualization is key to keeping your pointers straight. Use the tools we've given you!
Second, try not to get too thrown off by that
Node*&
in our example.For
insert()
it makes sense, because we are passing in a pointer to the tree root and we might need to change the value of the pointer!For other helper functions, we don't have to do that. They just take a
Node*
for their subtree.
(When logged in, completion status appears here.)