This assignment is due at 9 P.M. on Wednesday, April 28th, 2004. As usual, the README file is due at midnight the same day (i.e., the moment that Thursday starts). Refer to the homework policies page for submission instructions and general homework guidelines.
The primary purposes of this assignment are to gain experience with the internals of binary trees and to learn about binary (non-character) I/O.
For years, the DMV (Department of Motor Vehicles) for the state of Multifloria has been using a database running on a DEC PDP-11. They have only the executable binary for the code. (The Cobol source was lost long ago.) The code was not Y2K compatible. And, worse, ever since HP took over Compaq after Compaq took over DEC, they've multiplied by 10 the (already high) price for maintaining PDP-11's.
So the database code must be reimplemented. No living person remembers exactly how the Cobol code worked. But they do dimly remember that the system used a binary tree to hold the data. So the state bureaucrats wrote it into your contract that you have to use a binary tree. Fortunately for you, however, they didn't require that your binary tree be balanced and they've never even heard of iterators. However, your binary tree must be "closed" (see Weiss) and must not use parent pointers.
The database stores the following information on each driver:
As the database program runs, it must process a series of requests issued by DMV and its friends:
These requests are generated in a random pattern, in real time. For example, if a judge revokes someone's license, the police would like to be able to send a car to stake out the front of the court building and arrest them if they try to drive themselves away when they leave.
DMV officials, police, and the like type their requests into specially configured PalmPilots. These PalmPilots transmit the requests to a routing program, which gives them to your database program as an input file.
The request system used to run on a PDP-11-based teletype system. It was totally rewritten to use the PalmPilots. Unfortunately, since they didn't rewrite the database program at the same time, the new request system had to keep its output in the same format as the old request system. This is a kludgey and unfriendly binary format, which you are now stuck with as your input file format.
Finally, although the Multifloria DMV already charges an outrageous amount for license renewals, it has trouble balancing its budget. So, over the past few years, it has made extra money by selling its list of licensed drivers to various telemarketing firms. Therefore, your interface must also be able to generate a printout of all drivers in the database, sorted by license number, to send to the telemarketers.
Your program must store a database of licensed drivers, and must keep the database in a "closed" binary search tree (i.e., there must be an externally visible Tree class that holds the root, as well as an internal Node that holds the data and the child pointers). You must write the binary-tree class. The tree does not have to be balanced (though balancing is allowed), and it does not have to be a templated class (though templating is allowed if you're brave). The tree may not use parent pointers to simplify your algorithms. For each driver, you must store the following information, and it must be stored in the given format:
char
,
unsigned int
,
unsigned char
,
unsigned char
,
unsigned short
.
The key for the binary tree is the license number prefix and digits, considered together. Only those two fields are considered when comparing drivers.
The program must process an input file that is provided to it on
standard input (cin
) and is encoded in binary format; complete details are given
below. The input file contains a stream of commands. Most commands
have parameters associated with them. In response to each command,
the program will produce output on cout
.
Each command is indicated by a single character. Some commands take parameters; the exact encoding of the parameters is give below. Note that some parameters (the license number and dates) are actually encoded as multiple fields.
'i'
'r'
'f'
'a'
'p'
cout
.
There are no parameters.
As usual, the output format is very tightly specified, since the output of your program will be graded automatically.
Your program will read commands from the standard input (cin
).
Since the input is encoded, we have provided some test files for you to use when debugging your
program. Note that you cannot create your own test files with an
editor. However, we have also provided a conversion program named inputMaker
that you can use to generate
test files.
Unlike previous assignments, the input for this assignment is in
binary format. That means that it is not a sequence of printable
characters, so it must be processed differently. You will not be able
to use the >> operator to read your data, nor will you be able
to use cin.get
. Instead, you must use the
istream
member function read
. This function
accepts the address of a variable or array, plus a number of bytes to
read, which is usually specified using sizeof
. For
example:
char command; cin.read((char*)&command, sizeof command); if (!cin) // Error or EOF encountered // command now contains the next "char" from the standard input
Similarly, to read an integer, you would do:
int nextInt; cin.read((char*)&nextInt, sizeof nextInt); if (!cin) // Error or EOF encountered // nextInt now contains the "int" that followed "command" on cinYou can also read a specified number of characters. You will need code somewhat like the following when you are reading in the name of a driver:
int length; // Length of following string cin.read((char*)&length, sizeof length); if (!cin) // Handle error/EOF char* buffer = new char[length]; cin.read(buffer, length);
If you want to deal with C++ strings instead of char*
,
you need to create a string
after doing the read, and
remember to clean up the memory you allocated:
int length; // Length of following string cin.read((char*)&length, sizeof length); if (!cin) // Handle error/EOF char* buffer = new char[length]; cin.read(buffer, length); string result(buffer); delete[] buffer;
Finally, although you won't need to do so for this assignment, you can even read arrays of more complex data items:
int arraySize; cin.read((char*)&arraySize, sizeof arraySize); if (!cin) // Handle error/EOF double* array = new double[arraySize]; cin.read((char*)array, arraySize * sizeof(double));Note that in this last case, you must multiply the number of array elements by the size of each element,
sizeof(double)
.
The parentheses are needed because double
is a type name,
not a variable name.
You may wonder why one would go to the trouble of using the binary format when the >> operator is so convenient. The answer is usually efficiency. Binary formats are usually smaller and faster than the human-readable equivalent. On the other hand, binary formats are frequently non-portable. For example, you can't use the same binary input file for this problem on both Turing and a Windows box.
The commands in the input are designed to be easy for a program to
process, rather than being easy for a human to read. Each command
begins with a single char
that specifies an action to
take. The actions are encoded as follows:
'i'
'r'
'f'
'a'
'p'
After the char
that encodes the command, the arguments,
if any, will appear.
Each argument has its own format. The particular arguments to each
command are as follows:
'i'
char
), license number
(unsigned int
), length of name string including
the trailing NULL byte (unsigned int
), name as a
sequence of char
s, month of
birth (unsigned char
), day of birth
(unsigned char
), and year of birth
(unsigned short
).
'r'
char
) and license number
(unsigned int
).
'f'
char
) and license number
(unsigned int
).
'a'
char
), license number
(unsigned int
), month of query (unsigned
char
), day of query (unsigned char
), and
year of query (unsigned short
).
'p'
Note that the name in 'i'
command is given as an unsigned
integer length followed by a string. The code given above can be used to read it.
Also note that many small integers (e.g., month) are fed to your
program as unsigned char
s or unsigned
short
s. You must read these directly into a variable of the
same type. You can then copy the value into an integer or, if you
prefer, manipulate it directly in its original form (remember that a
char
is just a small integer big enough to hold the ASCII
value of a single symbol).
You do not need to worry about error-checking the dates and the license numbers in the input file. The PalmPilot application that generates the binary input file is constructed so that it will only generate legal dates. On the other hand, you do have to deal with the possibility that a command might try to process a nonexistent license, or that it might try to insert a duplicate driver.
We have provided some test files for your use, one for each major
binary architecture. These files must be downloaded IN
BINARY using your browser. To do this, either shift-click on
the link (in non-Windows browsers) or right-click and select "Save
link as..." or "Save target as...", depending on your browser. The
test files are packed into a tar archive
and a zip archive. There are several ways
to unpack a tar archive. The command "gunzip < hw11files.tgz
| tar xvf -
" will work on any machine. On Turing, you can also
use "gtar xvzf hw11files.tgz
". On Linux boxes, you can
use "tar xvzf hw11files.tgz
".
When testing, use files with "sparc" in their names on Turing and
Macs; use files
with "x86" in their names on Intel-compatible machines. Here is the
complete list of the archive contents:
Since you can't directly create a test file with an editor, we have
also provided a conversion program (inputMaker
) that can
convert an ASCII
description of the input into a valid test file. The archive contains
two versions of this program, one that runs on a Sparc and generates
Sparc/Mac-format test files, and one that runs on Linux/x86 and generates
Intel-format test files.
The conversion program reads ASCII from standard input and writes a binary test file to standard output. The input to the conversion program is almost exactly an ASCII equivalent of the binary input format. Whitespace is ignored (except within driver names), and you can even include comments by beginning a line with a pound sign (#). All command parameters are separated by whitespace. Here is short sample input file:
# Insert a driver, born May 3rd, 1978. Note that the full name goes LAST: i M 1234567 5 3 1978 Sam Student # Remove a (nonexistent) driver r N 7654321 # Find the driver we inserted f M 1234567 # See if the driver we inserted is of drinking age on New Year's Eve, 2000: a M 1234567 1 1 2000 # print the database p
WARNING: The order of parameters in the ASCII input file is not the same as the order in the binary file. Refer to the list of input commands for a description of the order of parameters in the binary file.
When you are dealing with an unknown binary format, it is often very
helpful to be able to examine a binary file. However, normal
tools like cat
and editors aren't very helpful.
Instead, you must use some special
tools to "dump" the input files in a somewhat more readable
format.
When you try to print your driver
database, you may have a problem printing the month and day of birth.
The reason is that you are using unsigned char
s to store
these values, and by default C++ will interpret them as ASCII data
when you print them. The solution is simple: before you output these
two fields, you must typecast them to unsigned int
:
cout << (unsigned int)driver.birth_month << '/' << (unsigned int)driver.birth_day << '/' << driver.birth_year;
Your program must produce very specific output in response to the commands, as follows:
'i'
cout
:
Attempt to re-insert existing driver: A1200894 Matt Norton 5/12/1960
'r'
cout
:
Attempt to remove nonexistent driver X9244210
'f'
cout
:
Found: U8478857 Lily Ross 8/15/1981
cout
:
Not found: S5731705
'a'
cout
:
Legal on 10/27/1999: Y9292492 Patrick Cunningham 4/14/1952Note that a driver is considered to be of legal age on their 21st birthday.
cout
:
Not legal on 6/25/1995: L2502725 Cathy Connell 3/21/1979
cout
:
Not found: S5731705
'p'
cout
. Before the listing, write a line
containing "Current database:" and after it write a line
with only five dashes. A small listing might look like
this:
Current database: B2818217 Davita Yang 1/14/1947 L2502725 Cathy Connell 3/21/1981 Y4695835 Sharon Wlodarski 1/16/1994 Y9292492 Patrick Cunningham 4/14/1952 -----
As usual, your program should not produce extraneous blanks at the end of its output lines.
The code you submit will be compiled with the g++
options
-Wall
, -W
, and -pedantic
. Your
program should
produce no errors or warning messages when compiled with these options
on Turing. If you absolutely cannot get rid of a warning, even with
the help of the professor or the graders, document it in the README
file along with the names of anyone who helped you try to understand
the problem.
As usual, you must
check out your assignment before beginning by using
"cs70checkout hw11
". This is true even though you will
be writing 100% of the program yourself.
Your submission should consist of a number of files:
Makefile
make
utility.
The makefile you provide must produce an
executable named assign_11
.
assign_11.cc
README
If you wish, you can create other files to help you develop this assignment, but it is not necessary.
When you have a working solution, you must submit your files with
cs70submit
. If you create any new files, you need to
tell the submission system about them by mentioning them once on a
cs70submit
command line.
For convenience, we have provided dummy versions of
README
, Makefile, and assign_11.cc
so that they will be sure to get submitted.
As usual, there are parts of this assignment that contain traps. Here are a few:
unsigned char
s
and remember to typecast appropriately.
There is more information on using C++ on Turing available in the
departmental
quick-reference guide and the
C++
quick reference guide.
You can find information about debugging in the
gdb
quick reference guide.
© 2004, Geoff Kuenning
This page is maintained by Geoff Kuenning.