CS 70

Using Your Search Tree (and Written Questions)

To round out our assignment, we'll use our TreeStringSet as an actual dictionary of English-language words.

We've provided a source file, minispell.cpp, which uses TreeStringSet to provide a proof-of-concept for a rudimentary spelling checker. It reads in a dictionary file containing English-language words, and then another file of words, and looks up all the words in the second file against the dictionary and reports how many matches it found. This functionality is very rudimentary (it doesn't ignore punctuation, offer corrections, or ignore repeated misspellings), but it's sufficient to exercise much of the functionality of our TreeStringSet-based dictionary.

You aren't required to read over the code for this program, but you can if you like.

Let's Go!

Adapt Your Makefile to Build minispell and Build It

Adjust your Makefile so that you build the program minispell (the executable will link both minispell.o and treestringset.o), and create this executable.

It will help if your Makefile has an OPTFLAGS= variable, as you can then set that variable from the command-line to easily have make build everything with optimization, as, for example,

make clean
make OPTFLAGS="-O3 -flto" minispell

Run minispell

If you run the program with the --help argument, it will produce a usage message:

DOCKER > ./minispell --help
Usage: ./minispell [options] [file-to-check ...]
Options:
  -h, --help             Print this message and exit.
  -f, --file-order       Insert words in the order they appear (default).
  -s, --shuffled-order   Insert words in a random order.
  -b, --balanced-order   Insert words in a balanced order
  -n, --num-dict-words   Number of words to read from the dictionary.
  -m, --num-check-words  Number of words to check for spelling.
  -d, --dict-file        Use a different dictionary file.

Default dictionary file: /home/student/data/smalldict.words
Default file to check:   /home/student/data/ispell.words

If you run it without arguments, you should see output similar to

DOCKER > ./minispell
Reading words from /home/student/data/smalldict.words... done!
Inserting into dictionary (in order read)... done!
 - insertion took 0.00075025 seconds
 - 341 nodes, height 340, average depth 170
 - median word in dictionary: 'long'

Reading words from /home/student/data/ispell.words... done!
Looking up these words in the dictionary... done!
 - looking up took 0.0177939 seconds
 - 34831 words read, 325 in dictionary

Exploration

The file smalldict.words contains the dictionary words in alphabetical order, which might seem convenient, but that turns out to be problematic for our tree class.

Now look at what changes when you run

  • minispell -s, which reads the same words, but shuffles them into a random order before inserting them into the tree.
  • minispell -b, which reads the same words, but figures out an insertion order that will create a very well-balanced tree.

Next, rerun minispell -s several times. See what changes between runs—pay particular attention to the height of the tree and the average depth.

Do the same with minispell -b.

The ./minispell executable has some additional options. If you give it a filename, it will check that file against the dictionary. You can also use a -d option to load a different dictionary file.

Now try

  • minispell -s -d ../data/ispell.words ../data/smalldict.words
  • minispell -b -d ../data/ispell.words ../data/smalldict.words
  • minispell -d ../data/ispell.words ../data/smalldict.words (you may need to be patient on this one!)
  • Rabbit speaking

    You can press Control-C to quit if you just want to give up waiting for the results on the last one.

Questions

On Gradescope, you'll find some questions asking you to run this program and think about the results.

(When logged in, completion status appears here.)