This assignment is due at 9 P.M. on Wednesday, April 14th, 2004. As usual, the README file is due at midnight the same day (i.e., the moment that Thursday starts). Refer to the homework policies page for general homework guidelines.
The primary purpose of this assignment is to see how lists can be used to build more complex data structures.
Your assignment is to implement a simple encryption program. To make the assignment both interesting and challenging, you will be required to use certain data structures and new techniques.
The cryptosystem that your program will implement is called a Vignère cipher. It is named after Blaise de Vignère, although it is really a corruption of a much more secure cipher he invented in 1585. The Vignère cipher is based the earlier Caesar cipher, which rotates letters through the alphabet. For example, a Caesar rotation of 1 letter would replace "A" by "B", "B" by "C", and so forth, wrapping around to replace "Z" by "A". A rotation of 2 would replace "A" by "C", etc.
Caesar used a constant rotation for his encoding, which made decryption quite simple. The variation named after Vignère modified the scheme by varying the rotation for each successive letter of the message. For example, the first letter might be rotated by 14 positions, the second by 5, and so forth. The pattern of rotation is controlled by a key, which is expressed as a simple word or phrase. The key is repeated when necessary to make it match up with the message.
An example may help explain this further. Suppose our key is "ALPHA" and we wish to encrypt the message "NOW IS THE TIME FOR CS FUN." Ignoring spaces, we can write the key and the message lined up as follows:
ALPHAALPHAALPHAALPHA NOWISTHETIMEFORCSFUNWe will let "A" represent rotation by zero positions, "B" mean 1, and so forth. Then the encryption of the above message can be written directly below it:
ALPHAALPHAALPHAALPHA NOWISTHETIMEFORCSFUN -------------------- NZLPSTSTAIMPUVRCDUBN
It turns out that cryptographers traditionally break messages into 5-character groups to make things a bit easier to work with, so the pencil-and-paper method of doing the above would generate:
ALPHA ALPHA ALPHA ALPHA NOWIS THETI MEFOR CSFUN ----------------------- NZLPS TSTAI MPUVR CDUBN
When the message is decrypted, it is up to the recipient to figure out where the blanks and punctuation should have been.
We will take advantage of the computer's flexibility by implementing a slight variation on the Vignère cipher. Instead of encrypting a 26-letter alphabet, we will add support for blanks and encrypt 27 symbols: A through Z, plus blanks. However, we will keep the 5-character grouping feature, so we can't produce blanks in the encrypted output. Instead, our output alphabet will use a period to represent the 27th character. The message above, when encrypted through the sample solution with the key "ALPHA" and using the 27-symbol alphabet, then becomes:
ALPHAALPHAALPHAALPHAALPHAAL NOW.IS.THE.TIME.FOR.CS.FUN. --------------------------- NZKGISKHOE.DXTE.QCY.CCOMUNKwhere the blanks between words are represented by periods in the original message ("plaintext") and the trailing period (encrypted to "K") is generated by the newline that appears at the end of any well-formed Unix file. A pencil-and-paper cryptographer would write the final message as "NZKGI SKHOE .DXTE .QCY. CCOMU NK".
Incidentally, the Vignère cipher is not a secure method of encryption, except in one special case. Modern cryptographers can decode it with very little effort, so you should not try to use it to protect anything important. (The special case is that if the key is at least as long as the message, the key is a truly random string of characters rather than an English word or phrase, and the key is never ever used again anywhere in the world, then the Vignère method reduces to something called a "one-time pad", which is the only provably secure encryption method.)
Your encryption program will prompt the user for a password, and then encode or decode a message contained in a file. There are some very specific requirements for how the program operates, designed so that your functions could conceivably be moved into a different program (i.e., a mail reader) someday if you wished.
Your program will follow a fairly standard Unix interface style. There will be two ways to invoke your program, depending on whether you are encrypting or decrypting information.
To encrypt, one can use:
./assign_09 -e file or ./assign_09 -e -g 5 file or ./assign_09 -g 5 -e fileThe
-e
indicates that you want to encrypt file
and
write the result to cout
. The optional -g
switch
specifies a grouping factor (i.e., generate code groups of 5
characters at a time). Your program should not assume that switches
appear in any particular order. There is more
information below on how to process command-line arguments.
It turns out (try running "ps auxw
" or "ps
-ef
", depending on the
machine you are using) that it is not a good
idea to give
passwords on the command line. Instead, your program should prompt
for a password on cerr
, using the string "Password: "
(with a trailing blank, and no newline), and read a password from
cin
. The password should be a word or phrase,
terminated
by a newline character.
In a real encryption program, you would turn off character echoing on the terminal so that nobody could look over the user's shoulder and get the password, but that's a bit of a pain for a CS70 assignment, so your program should ignore that detail. The actual password will be somewhat modified from what the user types by converting lowercase to uppercase and changing non-alphabetic characters to blanks (the same rules will be applied to the message to be encrypted).
If the -g
switch is not given, the output of the program
should be a single line (followed by the usual newline). If
-g
appears, the output
should be divided into groups of the specified number of characters,
with each group separated by a blank. When groups are being
generated, your program should never generate an output line longer
than 70 characters (unless the argument to -g
is itself
greater than 70). There should be no blanks at the end of the output
lines.
Since we are working with a 27-character alphabet ("A" through "Z" plus a blank), your program will not be able to deal nicely with lowercase characters and punctuation. Therefore, lowercase characters should be converted to uppercase, and all non-alphabetic characters should be treated as blanks (the same rules will apply to the password). For example, the string "CS 70 is really, really FUN!" would be encrypted exactly the same as if it read as follows (the line of dots is there to help you see all the blanks):
CS IS REALLY REALLY FUN ............................
Because blanks are used for grouping, they cannot be part of your output alphabet. Instead, the meaningful part of the output of an encryption run should be selected from "A" through "Z" and the period. The period should be considered to either precede "A" or follow "Z" (the choice is yours, and the results should be equivalent in either case).
Decryption is simpler than encryption. There is only one way to decrypt:
./assign_09 -d file
The specified file
should be some previous output of your
program. As with encryption, you should prompt the user for the
password. The decrypted message should be written to
cout
. Since all formatting and punctuation have been
lost, you should write it as a single long line, and let the user
worry about figuring it out.
When a C or C++ program is invoked under Unix, you can give it one or more
arguments (parameters) on the command line. The system
preprocesses these arguments for you and makes them available to your
main
program as function parameters. You should declare
your main
program like this:
int main(int argc, char* argv[]) { // ... }The parameter
argc
gives the number of arguments that
appeared on the command line, including the name of the
program itself. So an invocation like "./assign_09" will produce an
argc
of 1, while "./assign_09 x y z" will set
argc
to 4.
The parameter argv
is an array of pointers to character,
i.e., an array of C-style strings. argv[0]
is always the
name of the program as you invoked it, e.g.,
"./assign_09"
. Similarly, argv[1]
is the first
argument, expressed as a C string; argv[2]
is the second
argument, and so forth. For convenience, argv[argc]
is
guaranteed to be a NULL pointer.
It is illegal to refer to
argv[i]
when i
is greater than
argc
.
In most cases, Unix programs handle their command-line arguments in two phases. In the first phase, the options are extracted and recorded by setting various variables (often Boolean values). In the second phase, remaining non-option arguments are processed in the manner specified by the options.
As mentioned above, the first phase of argument handling involves processing the options, which typically begin with a dash. Some options (but not all) also require a following parameter. Because options are usually allowed to appear in any order, the option processing is normally done in a loop similar to the following:
while (there are more arguments left) if (the next argument begins with a dash) process that argument else break
The "process that argument" section is the interesting part of the
code. There are two typical approaches: either use a
switch
statement based on the second character of the
option, or use an if/else if
sequence to detect which
option has been specified and to handle it.
When an option takes a parameter, there are a couple of tricky aspects to processing it. Perhaps the sneakiest involves the way that the parameter is swallowed up. Since it is a separate argument, so you need to get rid of as part of processing the argument itself. You also have to make sure that it's actually there. The common approach is to increment the loop index inside the option-processing code. For example:
for (int argNo = 1; argNo < argc; argNo++) { // see above // ... // processing for option "-g": ++argNo; if (argNo >= argc) // Parameter is missing: issue usage error // parameter for "-g" is now in argv[argNo] // ..since we incremented argNo just now, and will // ..increment it again in the "for" statement, the // ..parameter for "-g" will not be examined to see if // ..it looks like an option. }
The other tricky aspect involves converting the parameter into a
usable form. When main
begins, all command-line
arguments are expressed as C-style strings (char*
). This
might not be the best way to deal with them internally. In
particular, for this assignment you'll want to convert the grouping
factor from a string to an integer. Fortunately, there's a handy
library routine to do just that for you. To use it, you should first
#include <cstdlib>
. The function is named
strtol
(convert a C-style string to a long
).
You can use it like
this:
int usefulThing = 0; // Default value is zero char* firstInvalidCharacter; // ... if (some useful decision) { usefulThing = strtol(argv[argNo], &firstInvalidCharacter, 0); if (argv[argNo][0] == '\0' || *firstInvalidCharacter != '\0') // error in argument, issue usage message
The strtol
function converts the C string given as its
first argument, into an integer and returns the value. The third
argument gives the number base to use for conversion; if it's zero,
the base is determined according to C++ syntax rules. The second
argument is a bit weird: the function will fill it in with a pointer
to the first non-numeric character in the string. If the
string is all numeric, this will be the '\0' at the end. Thus, if
firstInvalidCharacter
is anything other than '\0', you
had an argument that wasn't an integer and you should issue an error
message. (The first clause in the "if" statement handles the case
where the argument is a completely empty string.)
This may all sound complicated, but it's really very easy to write.
You have already seen examples in the processOptions
functions of assignments 5,
7, and
8.
The second phase of argument processing involves handling the
so-called positional arguments, which are those whose purpose
is identified by their position on the command line. For example, the
cp
(copy) command in Unix accepts two positional
arguments: the file to copy from and the file to copy into. In the
command:
cp -p foo baryou are asking the program to copy
foo
to
bar
using the -p
(preserve attributes)
option.
For this assignment, there is only one positional argument, the file
to be encrypted or decrypted.
If you choose to have a separate option-processing function, it
probably makes more sense to process the positional arguments inside
main
, not inside the option function.
Your program should verify that its arguments are correct. This
includes ensuring that exactly one of -d
and
-e
are specified, making sure that -g
is not
given with -d
, making sure that -g
has an
argument, ensuring that a file to be encrypted or decrypted is given
on the command line, and making sure that no illegal switches are
given. If any or these rules are violated, you should print a usage
message similar to the following:
Usage: ./assign_09 {-e [-g n]|-d} fileAll of the argument validations except one should be done inside the option-processing function (if you have one). The exception is checking to be sure there is a filename; it makes more sense to verify that detail inside
main
.
As usual, you must
check out your assignment before beginning by using
"cs70checkout hw09
". This is true even though you will
be writing 100% of the program yourself.
For homework #9, you must submit the following files:
Makefile
make
utility. For this
assignment, you must produce your own Makefile. I suggest
that you start with one from a previous assignment.
You
will be graded on your Makefile, so be sure you modify it
and test it.
The makefile you provide must produce an
executable named assign_09
.
assign_09.cc
README
If you wish, you can create other files to help you develop this assignment, but it is not necessary.
When you have a working solution, you must submit your files with
cs70submit
. If you create any new files, you need to
tell the submission system about them by mentioning them once on a
cs70submit
command line.
For convenience, we have provided dummy versions of
README
, Makefile, and assign_09.cc
so that they will be sure to get submitted.
We have already discussed most of the high-level
interface to your program. It should accept both the password and
the message to be
encrypted in mixed case. For both strings, it should convert
lowercase characters to uppercase
and convert non-alphabetic characters to blanks. The
<cctype>
header file defines a couple of functions
that will be useful in this regard:
true
if the character
ch
is alphabetic ("A" to "Z" or "a" to "z")
and false
otherwise.
true
if the character
ch
is whitespace (blank, TAB, newline, or one of
a few other special characters).
ch
if
ch
is alphabetic. The result is unreliable
if ch
is not alphabetic.
This assignment would be much easier if characters were encoded in a
friendly fashion. For example, if the letter 'A' were represented by
the number 0, 'B' by 1, and so forth up to 'Z' = 25, with 26
representing a blank, it would be relatively easy to write the
encryption code. Unfortunately, 'A' is decimal 65. However, there is
an easy way to solve this problem: do arithmetic on characters. As an
example, you can convert a character ch
to the 'A' = 0
scheme with the following code:
char convertedCharacter = ch - 'A';This trick will work only if
ch
is one
of the uppercase letters 'A' through 'Z'. It will not work if
ch
is lowercase, a blank, or some other special
characters.
In the same way, you convert a number between 0 and 25 back to an uppercase letter with:
ch = convertedCharacter + 'A';Again, this will only work if
convertedCharacter
is 0
through 25. If it is some other value, it will not generate a valid
letter.
Some of you will notice that the above code will work only on computers that use ASCII or a similar encoding. No problem; it's OK if your program only works on those computers.
For this assignment, you will be required to use a number of data structures, data elements, and techniques that are not a direct consequence of the external interface requirements. The purpose of these extra restrictions is to force you to get practice with a number of important C++ data structures and techniques.
There are two obvious ways in which a simple encryption program might work. Both ways assume that you already have the password stored internally. The first approach is to read a single character at a time from the input file, encrypt it, and write it to the output.
The second approach is to read the entire input file into a giant internal string. After reading the input, you can iterate through the string, encrypt each character, and store the encrypted version back into the string (modifying the character in-place). Finally, you can write the encrypted string to the output. You must use this second approach in this assignment. This is partly because it will give you more practice in using iterators, and partly because it will make your code cleaner.
An important design detail is that you should not insert
grouping characters as you encrypt. Instead, you should implement the
-g
switch as part of your output routine.
Your program must store both the password and the string to be encrypted using the chunky string class described below.
For this assignment, you are required to use your templated list class
from assignment 8 as
an underlying data structure.
If you did that assignment well, you should need to make no further
changes in your list class.
You may not add functions to the list class that are specialized
to supporting the encryption assignment, but that are not useful
other purposes. All functions that your
list class provides should be generic, in the sense that they would
make sense in a wide variety of programs.
For example, your list class should not have a function that converts
its data member to lower case, because that is not generic, but it is
OK to add a peekTail
function because that might be handy
in other applications.
The peekTail
example was not chosen accidentally.
You will almost certainly need to have peekTail
for use by
your chunky strings. For symmetry, peekHead
is also a
good idea, although you probably won't need it for this assignment.
Whereas the list-pop functions would return a
data type by value (e.g., DATA popHead();
), the peek
function(s) should return by reference (DATA& peekTail();
)
so that the caller can directly manipulate the information stored in
the list.
The same modification rules apply to your list iterator class. You
may wish to add a general-purpose reset
function to it,
but there should be nothing specialized specifically to the encryption
assignment that is not useful elsewhere.
Your solution to this assignment must make use of a rather interesting string class that stores a string as a linked list of fixed-size "chunks," each of which is four characters long. Such a class is a compromise between storing the string as an array (which makes inserting characters expensive) and storing it as a linked list of single characters (which wastes memory).
Since a string is represented as a linked list of chunks, a small string would fit in a single list element. If the string is too large to fit into a single piece, you create a second piece and then tie it together with the first, using the linked list as the underlying structure.
Your string class should be built on top of the list class; like the
main program, it should have no direct knowledge of the structure of
the list. In other words, you may not make the
string a friend
of the list class.
In case the above description is not clear, here is an example of the private data from my version of the structure that is used to store the individual pieces of the string:
class Chunk { // ... private: unsigned int length; char value[CHUNKSIZE]; };where
CHUNKSIZE
is a constant giving the number of
characters in each piece (i.e., 4). I can then declare the entire
string as a list using something like List<Chunk>
string
in the private data section of my
ChunkyString
class.
To the outside world, your string class should just look like something that stores strings. The user should not be able to tell, based on the interface, that the strings are stored in pieces instead of as a single array of characters. This has two implications:
The exact set of operations you choose to support is up to you, and to some extent it depends on what your program needs. I found the following operations to be minimally necessary in my own implementation:
+
" or "+=
"
operator) of a single character to the end
of the string
With the exception of the get-length function, I chose to implement all of the above as overloaded operators. In addition to the above, list, I implemented a number of functions, including general concatenation and all of the Boolean comparison operators, just in case I needed them. Your mileage may vary, of course.
All of the above functions should be implemented with the minimum
complexity possible. In the above list, the default constructor,
"+=
" operator (for single characters), and get-length
operation should all be
O(1). The copy constructor, assignment operator,
"+
" operator, and stream output are inherently
O(N) and should be implemented that way.
Your string-as-list class must allow the outside user to treat it just like any other string, hiding the internal representation. Your main program should have no knowledge of the fact that the string is internally represented in chunks. To make that possible, you will need a string iterator that allows the main program to walk through the individual characters of the string.
A string iterator
is required for this assignment. Your string iterator should
be built on top of the list iterator; like the main program, it should
have no direct
knowledge of the structure of the list. In other words, you may
not make the string iterator a friend
of
the list iterator.
I found it convenient to have
a list iterator as a private data member in my string-iterator class;
I called it subIterator
because it is a subsidiary
iterator that is hidden from the outside world.
When you are done building the string iterator, you should be able to do something like this (assuming your string class is called ChunkyString and the iterator is ChunkyStringIterator):
ChunkyString stuff; // .. put characters into the string for (ChunkyStringIterator i(stuff); i; i++) { if (*i == 'a') cout << "I found an A in the string\n"; }
Note that the user of the ChunkyString has no knowledge of the fact that there is a list hiding underneath. If your main program even mentions the List class or the ListIterator class, you have taken the wrong approach and you will lose many points.
Just as output is written to an ostream
such as
cout
or cerr
, input is read from an
istream
such as cin
. Doing so for this
assignment will require that you use several C++ I/O features. Most
of these features are enabled when you #include <iostream>
.
More complete details on the functions
discussed below can be found in the notes
on C++ I/O.
To read the password, you will need to read one character at a time
from cin
. This can be done with code like the
following:
char nextCharacter; while (cin.get(nextCharacter)) // ...do stuff with nextCharacter
In the above code, the loop will exit when there are no more
characters available on cin
(i.e., EOF was hit).
Note that EOF is not the same as the end of a line. Since the
password is only one line long, you must detect the end of the line
yourself.
The string to be encrypted must be read from a file whose name is
given to you on the command line. Before you can read a named file,
you must #include <fstream>
.
Then you must open the file, read it, and close it.
In C++, this is easy: you open a file by creating an
ifstream
(for reading, or input) or ofstream
(for
writing, or output). The file is automatically closed when the associated
ifstream
or ofstream
is destroyed.
Once you have created an ifstream
, you can read from it
just as if it were cin
.
To make this explanation more concrete, suppose you want to read characters
from a file named "myfile.txt
". You could write
something like this:
ifstream inputStream("myfile.txt"); if (!inputStream) // ...Oops, myfile.txt doesn't seem to be available! char nextCharacter; while (inputStream.get(nextCharacter)) // ...do stuff with nextCharacter
Of course, in most cases you won't want to hardwire the file name into
your code. (Even if you did, it counts as a "magic number", so you
should define it as a const string
rather than sticking it
into the middle of your program.) Here's a very similar example, only
this time the file name is stored in a const string&
variable named whichFile
, which is passed as a function
argument. This function returns true
if the file was
successfully read. (It also doesn't return the string read, which
makes it somewhat useless. Fixing that deficiency is left to you as
an exercise.)
The c_str
member function of the
string
class converts a C++ string
into a
C-style char*
; this is necessary because the
ifstream
constructor stupidly won't accept string
s.
bool readFile(const string& whichFile) { ifstream inputStream(whichFile.c_str()); if (!inputStream) return false; // Couldn't open the file char nextCharacter; while (inputStream.get(nextCharacter)) // ...do stuff with nextCharacter return true; }
Here is a trivial sample input file, and the
output it generates when encrypted with
-g 5
and the pass phrase "cs fun". If that output file
is fed back into the decryption routine, it generates a slightly modified version of the original.
Note that the decrypted version has a blank at the end (visible only
if you download it and use an editor or "cat -vet
" to
examine it). The trailing blank did not
appear in the original. Why is it there now?
Encrypting a different input file with the
pass phrase "My roommate never studies, why should I?" but no
-g
switch produces a single
very long output line. The decrypted
version of the file demonstrates that a certain amount of
(presumably) useful information has been lost.
Finally, to make the assignment interesting, here is an encrypted file for you to decode once your program is working correctly. The file was encrypted with the pass phrase "When I get my program working I am going to get some sleep". (Note that there is no trailing period in the pass phrase; it consists solely of alphabetic characters and blanks.)
All of these sample files will be placed in your directory when you
cs70checkout
the assignment.
icc
Files
By default, emacs doesn't know that icc
files contain C++
code. There are three ways to tell emacs to use C++ mode:
ESC x c++-mode RET
" each time you visit
the file. Obviously, this is a pain.
// ;-*-C++-*-
" as the first line
of the file. Emacs will recognize the line and automatically
switch to C++ mode. This is less of a pain, but you still
have to do it to every file.
.emacs
" file in
your home directory. (If you don't have a
".emacs
" file, create one containing this line):
(setq auto-mode-alist (append '(("\\.icc$" . c++-mode)) auto-mode-alist))The line must be inserted exactly as given above, including the double backslash, the parentheses, and the funny single quote.
As usual, there are parts of this assignment that contain traps. Here are a few:
const
objects. The simplest
solution is to just eliminate const
when the
compiler complains. A slightly cleaner solution is to "cast
away the const" to get rid of the complaint.
For example, if
you had a const ChunkyString foo
and wanted to create
an iterator on it, you couldn't write:
ChunkyStringIterator i(foo);because the compiler would complain. To shut it up, try:
ChunkyStringIterator i((ChunkyString&) foo);or, in an uglier but more modern notation:
ChunkyStringIterator i(const_cast<ChunkyString&> (foo));There are other solutions, but they are more complex.
P.S. If you never run into this problem, don't worry. Your
code might be missing a few const
declarations
that ought to be there, but it won't affect your grade on this
assignment.
reset
member function in your iterators. This function should reset
the iterator to the beginning of the list or string. It is
similar to writing i = 0
in an array loop.
++
operator for your iterator.
-g
switch at first, and add it to your output
routine later. Support for the -g
switch is
necessary to get full credit. Do not implement
-g
inside your decryption function.
-g
" switch, and then re-encrypting it with "Z",
should produce (almost) the original input. There will be an
extra "garbage" character at the end. Why?
-g
" switch, and also without it.
-g
" switch limits line length
to 70 characters (unless the argument to "-g
" is
over 70). Test your code with values of -g
near
70 to be sure it behaves properly. Also try values that are
factors of 70, or are close to factors of 70.
There is more information on using C++ on Turing available in the departmental quick-reference guide and the C++ quick reference guide.
© 2004, Geoff Kuenning
This page is maintained by Geoff Kuenning.