CS 70

C++ Implementation: insert()

Here's our encoding:

class IntTree {
 public: ...
 private:
    // Encoding for a tree node
    struct Node {
        int key_;
        Node* left_  = nullptr;     // if not specified, default to nullptr
        Node* right_ = nullptr;     // if not specified, default to nullptr
    };

    // The tree object owns all the nodes, and has a pointer to the root node.
    Node* root_;
};
  • Duck speaking

    Wait. I don't see why you needed a Node struct at all. Couldn't you just have put all the Node data members in the IntTree class directly?

  • LHS Cow speaking

    No. For example, if we said every IntTree has a key_, we wouldn't be able to deal with empty IntTrees.

  • RHS Cow speaking

    Overall, everything would be worse and more confusing. Don't try to avoid having a distinct type for your tree nodes.

insert()

We'll have a public member function that inserts a given key into the structure:

void insert(int k);

How will we implement this operation using our encoding?

Remember, here is our BST-INSERT pseudocode:

BST-INSERT(treenode, x):
    if there isn't a treenode (i.e., this subtree is empty):    // base case
        replace empty subtree with new tree node containing x as its value (and empty subtrees)
    else if x < treenode.value:
        BST-INSERT(treenode.leftSubtree, x)      // recurse on smaller case (left subtree)
    else:    // x >= treenode.value
        BST-INSERT(treenode.rightSubtree, x)     // recurse on smaller case (right subtree)
  • Cat speaking

    I'm confused. I don't think I can use this pseudocode to write the insert() member function.

  • LHS Cow speaking

    Ooooh. Good point!

Why can't we simply adapt this pseudocode to define the insert function for the IntTree class?

Our BST-INSERT pseudocode was written for our conceptual encoding of the tree data structure, where we only thought about nodes. Our C++ encoding is a little different. Let's look back at the diagram we when we introduced the encoding:

Tree Encoding

In one sentence, the issue is

BST-INSERT is all about the stuff in blue circles in our diagram, not the brown square, but the brown square is the IntTree object on which the insert() member function operates.

The interface given by our BST-INSERT pseudocode was written in a language-neutral way, but it was also an interface that was well-suited to recursion. It took a treenode argument, and recursive calls could pass in smaller subtrees for that argument until we hit the base case.

In contrast, in our C++ encoding, our tree's (private) Nodes are wrapped up inside an IntTree class.

Our class needs an insert() function, but that function itself can't be recursive—there's only one IntTree object to work with, so there are no other smaller IntTrees to recurse on. So this insert() function can't be BST-INSERT.

We do still implement a version of BST-INSERT, but as a separate (private) helper function—one that works on Nodes. Its interface will look just like the interface of BST-INSERT, but instead of being called directly by the user, this function will be called by our insert() function, passing in the root_ of the tree as a starting point.

So, whereas our BST-INSERT pseudocode was a recursive function and was also an appropriate interface for inserting into the tree, in our C++ version, these aspects are separate and different—the interface that users call is insert, and it hands off the work to a separate private helper function whose interface is well suited to recursing over our Nodes, but only the insert() function ever sees that interface, users don't.

We can declare insert() and insertHelper() in our class definition as follows:

class IntTree {
  public:
    void insert(int k);
    ...

  private:
    struct Node {
      ...
    };
    Node* root_;

    void insertHelper(Node*& tree, int k);
};
  • Horse speaking

    Whoah! Whoah! What is that Node*& thing?

  • LHS Cow speaking

    You can use our inside-out rule for type-reading to find out! It's a reference to a pointer to a Node.

  • Hedgehog speaking

    Why would you do this to meeeee??

  • LHS Cow speaking

    It's all because of the part of the pseudocode where we said “replace empty subtree with new tree node”—we need to be able to update trees and pointers. Let's go over it…

For us, a Node* represents a subtree (by pointing to its root node).

In the base case, where the subtree is empty, we need to replace an empty subtree with a new tree consisting of a single node. So we need to change a pointer that was previously nullptr to one that points to our new node.

But we want the original pointer (wherever that is) to get updated. If we passed a Node* and then assigned that to a new value, we'd just be changing a pointer variable in our function and nowhere else. So we need to pass a reference, a Node*&, so we can reassign the pointer to a new node.

Probably some code will help make everything make sense…

void IntTree::insert(int k) {
    if (!exists(k)) { // Don't insert keys already in the tree
        insertHelper(root_, k);
    }
}

void IntTree::insertHelper(Node*& tree, int k) {
    if (tree == nullptr) {
        tree = new Node{k, nullptr, nullptr};
        // ^-- because tree is a reference to a pointer, this will 
        //     change the value of the pointer that was passed in!
    }
    else if (k < tree->key_) {
        insertHelper(tree->left_, k);
    }
    else {
        insertHelper(tree->right_, k);
    }
}

Because we pass tree by reference, when we run tree = new Node{k, nullptr, nullptr}, we will actually be changing either tree->left_, tree->right_, or root_, depending on where insertHelper was called. If we'd declared tree as just a Node*, that assignment wouldn't have any lasting effect.

Here's a video example of walking through this code using our memory model. (For simplicity, we'll leave out that call to exists.)

A Couple of Tips

  • RHS Cow speaking

    First, if, by chance, you happen to find yourself implementing a BST in C++ in the near future, we recommend that you draw lots of pictures!

  • LHS Cow speaking

    Visualization is key to keeping your pointers straight. Use the tools we've given you!

  • RHS Cow speaking

    Second, try not to get too thrown off by that Node*& in our example.

  • LHS Cow speaking

    For insert() it makes sense, because we are passing in a pointer to the tree root and we might need to change the value of the pointer!

  • RHS Cow speaking

    For other helper functions, we don't have to do that. They just take a Node* for their subtree.

(When logged in, completion status appears here.)