Using Your Search Tree (and Written Questions)
To round out our assignment, we'll use our TreeStringSet
as an actual dictionary of English-language words.
We've provided a source file, minispell.cpp
, which uses TreeStringSet
to provide a proof-of-concept for a rudimentary spelling checker. It reads in a dictionary file containing English-language words, and then another file of words, and looks up all the words in the second file against the dictionary and reports how many matches it found. This functionality is very rudimentary (it doesn't ignore punctuation, offer corrections, or ignore repeated misspellings), but it's sufficient to exercise much of the functionality of our TreeStringSet
-based dictionary.
You aren't required to read over the code for this program, but you can if you like.
Let's Go!
Adapt Your Makefile
to Build minispell
and Build It
Adjust your Makefile
so that you build the program minispell
(the executable will link both minispell.o
and treestringset.o
), and create this executable.
It will help if your Makefile
has an OPTFLAGS=
variable, as you can then set that variable from the command-line to easily have make
build everything with optimization, as, for example,
make clean
make OPTFLAGS="-O3 -flto" minispell
Run minispell
If you run the program with the --help
argument, it will produce a usage message:
> ./minispell --help
Usage: ./minispell [options] [file-to-check ...]
Options:
-h, --help Print this message and exit.
-f, --file-order Insert words in the order they appear (default).
-s, --shuffled-order Insert words in a random order.
-b, --balanced-order Insert words in a balanced order
-n, --num-dict-words Number of words to read from the dictionary.
-m, --num-check-words Number of words to check for spelling.
-d, --dict-file Use a different dictionary file.
Default dictionary file: ../..//data/smalldict.words
Default file to check: ../../data/ispell.words
For your convenience, the program assumes that you are running it from within your repoistory in the cs70
directory in your home directory on the server. If you run it from there without arguments, you should see output similar to the below. (If you are running it from a different place in the file system, you may need to use the -d
option to specify the path to the dictionary file.
> ./minispell
Reading words from /home/student/data/smalldict.words... done!
Inserting into dictionary (in order read)... done!
- insertion took 0.00075025 seconds
- 341 nodes, height 340, average depth 170
- median word in dictionary: 'long'
Reading words from /home/student/data/ispell.words... done!
Looking up these words in the dictionary... done!
- looking up took 0.0177939 seconds
- 34831 words read, 325 in dictionary
Exploration
The file smalldict.words
contains the dictionary words in alphabetical order, which might seem convenient, but that turns out to be problematic for our tree class.
Now look at what changes when you run
minispell -s
, which reads the same words, but shuffles them into a random order before inserting them into the tree.minispell -b
, which reads the same words, but figures out an insertion order that will create a very well-balanced tree.
Next, rerun minispell -s
several times. See what changes between runs—pay particular attention to the height of the tree and the average depth.
Do the same with minispell -b
.
The ./minispell
executable has some additional options. If you give it a filename, it will check that file against the dictionary. You can also use a -d
option to load a different dictionary file.
Now try
minispell -s -d ../../data/ispell.words ../../data/smalldict.words
minispell -b -d ../../data/ispell.words ../../data/smalldict.words
minispell -d ../../data/ispell.words ../../data/smalldict.words
(you may need to be patient on this one!)
You can press Control-C to quit if you just want to give up waiting for the results on the last one.
Questions
On Gradescope, you'll find some questions asking you to run this program and think about the results.
(When logged in, completion status appears here.)