CS70, Fall 2002

Homework Assignment #11

This assignment is due at 10 P.M. on Thursday, December 12th, 2002. Exception: the README file is due at 1 A.M. on the following day (Friday). Refer to the homework policies page for submission instructions and general homework guidelines.

The primary purposes of this assignment are to gain experience with the internals of binary trees and to learn about binary (non-character) I/O.

Overview

For years, the DMV (Department of Motor Vehicles) for the state of Multifloria has been using a database running on a DEC PDP-11. They have only the executable binary for the code. (The Cobol source was lost long ago.) The code was not Y2K compatible. And, worse, since HP took over Compaq after Compaq took over DEC, they've multiplied by 10 the (already high) price for maintaining PDP-11's.

So the database code must be reimplemented. No living person remembers exactly how the Cobol code worked. But they do dimly remember that the system used a binary tree to hold the data. So the state bureaucrats wrote it into your contract that you have to use a binary tree. Fortunately for you, however, they didn't require that your binary tree be balanced and they've never even heard of iterators. However, your binary tree must be "closed" (see Weiss) and must not use parent pointers.

The database stores the following information on each driver:

As the database program runs, it must process a series of requests issued by DMV and its friends:

These requests are generated in a random pattern, in real time. For example, if a judge revokes someone's license, the police would like to be able to send a car to stake out the front of the court building and arrest them if they try to drive themselves away when they leave.

DMV officials, police, and the like type their requests into specially configured PalmPilots. These PalmPilots transmit the requests to a routing program, which gives them to your database program as an input file.

The request system used to run on a PDP-11-based teletype system. It was totally rewritten to use the PalmPilots. Unfortunately, since they didn't rewrite the database program at the same time, the new request system had to keep its output in the same format as the old request system. This is a kludgey and unfriendly binary format, which you are now stuck with as your input file format.

Finally, although the Multifloria DMV already charges an outrageous amount for license renewals, it has trouble balancing its budget. So, over the past few years, it has made extra money by selling its list of licensed drivers to various telemarketing firms. Therefore, your interface must also be able to generate a printout of all drivers in the database, sorted by license number, to send to the telemarketers.

Details

Your program must store a database of licensed drivers, and must keep the database in a "closed" binary search tree (i.e., there must be an externally visible Tree class that holds the root, as well as an internal Node that holds the data and the child pointers). You must write the binary-tree class. The tree does not have to be balanced (though balancing is allowed), and it does not have to be a templated class (though templating is allowed if you're brave). The tree may not use parent pointers to simplify your algorithms. For each driver, you must store the following information, and it must be stored in the given format:

The program must process an input file that is provided to it on standard input (cin) and is encoded in binary format; complete details are given below. The input file contains a stream of commands. Most commands have parameters associated with them. In response to each command, the program will produce output on cout.

Each command is indicated by a single character. Some commands take parameters; the exact encoding of the parameters is give below. Note that some parameters (the license number and dates) are actually encoded as multiple fields.

The commands are:

'i'
Insert: add a new driver to the database. In order, the parameters are the license number, full name, and date of birth.
'r'
Remove: remove a driver from the database. The only parameter is the license number.
'f'
Find: locate a driver's information in the database. The only parameter is the license number.
'a'
Age: ask whether a driver is of legal drinking age. In order, the parameters are the license number and the date of the query.
'p'
Print: print the entire database on cout. There are no parameters.

As usual, the output format is very tightly specified, since the output of your program will be graded automatically.

Input Format

Your program will read commands from the standard input (cin).

Since the input is encoded, we have provided some test files for you to use when debugging your program. Note that you cannot create your own test files with an editor. However, we have also provided a conversion program named inputMaker that you can use to generate test files.

Reading in Binary

Unlike previous assignments, the input for this assignment is in binary format. That means that it is not a sequence of printable characters, so it must be processed differently. You will not be able to use the >> operator to read your data, nor will you be able to use cin.get. Instead, you must use the istream member function read. This function accepts the address of a variable or array, plus a number of bytes to read, which is usually specified using sizeof. For example:

    char command;
    cin.read(&command, sizeof command);
    if (!cin)
        // Error or EOF encountered
    // command now contains the next "char" from the standard input

In g++ version 2, you could use the above code on any data type. In g++3, you have to explicitly typecast everything to char*. So to read an integer:

    int nextInt;
    cin.read((char*)&nextInt, sizeof nextInt);
    if (!cin)
        // Error or EOF encountered
    // nextInt now contains the "int" that followed "command" on cin
You can also read a specified number of characters. You will need code somewhat like the following when you are reading in the name of a driver:
    int length;                           // Length of following string
    cin.read((char*)&length, sizeof length);
    if (!cin)
        // Handle error/EOF
    char* buffer = new char[length];
    cin.read(buffer, length);

If you want to deal with C++ strings instead of char*, you need to create a string after doing the read, and remember to clean up the memory you allocated:

    int length;                           // Length of following string
    cin.read((char*)&length, sizeof length);
    if (!cin)
        // Handle error/EOF
    char* buffer = new char[length];
    cin.read(buffer, length);
    string result(buffer);
    delete[] buffer;

Finally, although you won't need to do so for this assignment, you can even read arrays of more complex data items:

    int arraySize;
    cin.read((char*)&arraySize, sizeof arraySize);
    if (!cin)
        // Handle error/EOF
    double* array = new double[arraySize];
    cin.read((char*)array, arraySize * sizeof(double));
Note that in this last case, you must multiply the number of array elements by the size of each element, sizeof(double). The parentheses are needed because double is a type name, not a variable name.

Why Binary?

You may wonder why one would go to the trouble of using the binary format when the >> operator is so convenient. The answer is usually efficiency. Binary formats are usually smaller and faster than the human-readable equivalent. On the other hand, binary formats are frequently non-portable. For example, you can't use the same binary input file for this problem on both Turing and a Windows box.

The Input Commands

The commands in the input are designed to be easy for a program to process, rather than being easy for a human to read. Each command begins with a single char that specifies an action to take. The actions are encoded as follows:

'i'
Insert
'r'
Remove
'f'
Find
'a'
Age
'p'
Print

Following the command character integer will be the arguments, if any. Each argument has its own format. The particular arguments to each command are as follows:

'i'
License prefix (char), license number (unsigned int), length of name string including the trailing NULL byte (unsigned int), name as a sequence of chars, month of birth (unsigned char), day of birth (unsigned char), and year of birth (unsigned short).
'r'
License prefix (char) and license number (unsigned int).
'f'
License prefix (char) and license number (unsigned int).
'a'
License prefix (char), license number (unsigned int), month of query (unsigned char), day of query (unsigned char), and year of query (unsigned short).
'p'
No arguments.

Note that the name in 'i' command is given as an unsigned integer length followed by a string. The code given above can be used to read it. Also note that many small integers (e.g., month) are fed to your program as unsigned chars or unsigned shorts. You must read these directly into a variable of the same type. You can then copy the value into an integer or, if you prefer, manipulate it directly in its original form (remember that a char is just a small integer big enough to hold the ASCII value of a single symbol).

You do not need to worry about error-checking the dates and the license numbers in the input file. The PalmPilot application that generates the binary input file is constructed so that it will only generate legal dates. On the other hand, you do have to deal with the possibility that a command might try to process a nonexistent license, or that it might try to insert a duplicate driver.

Test Files

We have provided some test files for your use, one for each major binary architecture. These files must be downloaded IN BINARY using your browser. Cutting and pasting will not work. The test files are packed into a tar archive and a zip archive. Use files with "sparc" in their names on Turing and Macs; use files with "x86" in their names on Intel-compatible machines. Here is the complete list of the archive contents:

simplesparc.bin
Sparc/Mac-format input file for a simple test case with only a few drivers.
simplex86.bin
Intel-format input file for a simple test case with only a few drivers.
simpleoutput.txt
The output of the program for the simple test case.
complexsparc.bin
Sparc/Mac-format input file for a more reasonably sized test case.
complexx86.bin
Intel-format input file for a more reasonably sized test case.
complexoutput.txt
The output of the program for the complex test case.
inputMaker.sparc
Sparc executable that can generate a Sparc/Mac test file from an ASCII description.
inputMaker.linux
Linux executable that can generate an Intel-format test file from an ASCII description.

Conversion Program

Since you can't directly create a test file with an editor, we have also provided a conversion program (inputMaker) that can convert an ASCII description of the input into a valid test file. The archive contains two versions of this program, one that runs on a Sparc and generates Sparc/Mac-format test files, and one that runs on Linux/x86 and generates Intel-format test files.

The conversion program reads ASCII from standard input and writes a binary test file to standard output. The input to the conversion program is almost exactly an ASCII equivalent of the binary input format. Whitespace is ignored (except within driver names), and you can even include comments by beginning a line with a pound sign (#). All command parameters are separated by whitespace. Here is short sample input file:

# Insert a driver, born May 3rd, 1978.  Note that the full name goes LAST:
i M 1234567 5 3 1978 Sam Student
# Remove a (nonexistent) driver
r N 7654321
# Find the driver we inserted
f M 1234567
# See if the driver we inserted is of drinking age on New Year's Eve, 2000:
a M 1234567 1 1 2000
# print the database
p

WARNING: The order of parameters in the ASCII input file is not the same as the order in the binary file. Refer to the list of input commands for a description of the order of parameters in the binary file.

Analyzing Binary Files

When you are dealing with an unknown binary format, it is often very helpful to be able to examine a binary file. However, normal tools like cat and editors aren't very helpful. Instead, you must use some special tools to "dump" the input files in a somewhat more readable format.

Output Format

When you try to print your driver database, you may have a problem printing the month and day of birth. The reason is that you are using unsigned chars to store these values, and by default C++ will interpret them as ASCII data when you print them. The solution is simple: before you output these two fields, you must typecast them to unsigned int:

    cout << (unsigned int)driver.birth_month << '/'
      << (unsigned int)driver.birth_day << '/'
      << driver.birth_year;

Your program must produce very specific output in response to the commands, as follows:

'i'
  1. If the driver already exists, produce a message like the following on cout:
    Attempt to re-insert existing driver: A1200894 Matt Norton 5/12/1960
    
  2. Otherwise, there is no output.
'r'
  1. If the driver does not exist, produce a message like the following on cout:
    Attempt to remove nonexistent driver X9244210
    
  2. Otherwise, there is no output.
'f'
  1. If the driver is found, produce a message like the following on cout:
    Found: U8478857 Lily Ross 8/15/1981
    
  2. If the driver does not exist, produce a message like the following on cout:
    Not found: S5731705
    
'a'
  1. If the driver exists and is of legal drinking (21 or older), produce a message like the following on cout:
    Legal on 10/27/1999: Y9292492 Patrick Cunningham 4/14/1952
    
    Note that a driver is considered to be of legal age on their 21st birthday.
  2. If the driver exists, but is not of legal drinking age, produce a message like the following on cout:
    Not legal on 6/25/1995: L2502725 Cathy Connell 3/21/1979
    
  3. If the driver does not exist, produce a message like the following on cout:
    Not found: S5731705
    
'p'
Produce a complete listing of the driver database, sorted by license number (prefix and digits), on cout. Before the listing, write a line containing "Current database:" and after it write a line with only five dashes. A small listing might look like this:
Current database:
B2818217 Davita Yang 1/14/1947
L2502725 Cathy Connell 3/21/1981
Y4695835 Sharon Wlodarski 1/16/1994
Y9292492 Patrick Cunningham 4/14/1952
-----

As usual, your program should not produce extraneous blanks at the end of its output lines.

Compilation

The code you submit will be compiled with the g++ options -Wall, -W, and -pedantic. Your program should produce no errors or warning messages when compiled with these options on Turing. If you absolutely cannot get rid of a warning, even with the help of the professor or the graders, document it in the README file along with the names of anyone who helped you try to understand the problem.

Submitting

Your submission should consist of a number of files:

Makefile
A "make file" containing instructions on how to compile your program with the make utility.

The makefile you provide must produce an executable named assign_11.

assign_11.cc
The C++ code for your main program for the assignment.
*.hh, *.cc, *.icc
Header and source files containing the classes you implement. Some of these will be lifted directly from previous assignments, or will be extended versions of classes in previous assignments. It is up to you to choose the names for these files.
README
A documentation file, as specified in the homework policies page. Note that this file is not due until 3 hours after the other files in the assignment.

If you wish, you can create other files to help you develop this assignment, but it is not necessary.

When you are ready to submit your program, cd into the proper directory (e.g., cs70/hw11) and run the cs70submitall command. This command will prompt you for the assignment number (see the top of this Web page) and will then capture all of the source files, Makefiles, and README files in your directory, so BE CERTAIN that you don't have anything in your directory besides your assignment.

If you discover a mistake in your program, you can resubmit it using the same command. You can submit as many times as you like; only the last version will be used.

Since the README file is due later than the rest of the assignment, you may choose to submit it separately. You can do this with the cs70submit command:

    cs70submit README
If you already submitted your code separately, DO NOT use cs70submitall to submit the README file, or it will appear that you missed the deadline even though you were really on time.

Tricky Stuff

As usual, there are parts of this assignment that contain traps. Here are a few:


There is more information on using C++ on Turing available in the departmental quick-reference guide and the C++ quick reference guide. You can find information about debugging in the gdb quick reference guide.


© 2002, Geoff Kuenning

This page is maintained by Geoff Kuenning.