The program for this assignment, and everything else except the README file, is due at 10 P.M. on Wednesday, November 6th, 2002. As usual, the README file is due at 1 A.M. on the following day (Thursday). Refer to the homework policies page for general homework guidelines.
The primary purpose of this assignment is to get you used to writing C++ iterators. You will also be developing a preliminary list class. Both the list class and the iterator for it will be useful to you in later assignments.
NOTE: the list class you develop in this assignment
will be central to assignments 8 and 9. MAKE
SURE you develop
it well and debug it thoroughly.
If you blow this assignment off, you will do poorly on the following
two assignments as well.
One of the more creative approaches to artificial intelligence is the genetic algorithm, invented by Prof. John Holland of the University of Michigan.
In brief, a genetic algorithm simulates the process of evolution by applying the usual rules of genetics to simulate natural selection. In real life, natural selection's primary goal is the continuation of the species, and organisms that achieve that goal tend to be propagated. In a genetic algorithm, on the other hand, the primary goal is to satisfy a "fitness function" chosen by the programmer. For example, a simple fitness function might interpret the genes of an organism as the value of x in a complicated equation. The natural-selection process could then be tuned to prefer organisms that generate an output near zero, so that the survivors would eventually produce a solution to the equation.
Genetic algorithms were the first step in the current research area called "artificial life," and they have been used to successfully solve many problems that were otherwise intractable.
In this assignment, we will create a program that uses a genetic algorithm to find approximate square roots of integers. Although it is simplified compared to a production implementation, the program demonstrates the basic outline and capabilities of a genetic algorithm.
There are three basic processes in evolution: mutation, crossover, and selection. Mutation involves selecting a gene site and modifying it in some fashion, usually by replacing it with another gene. Mutation is very rare both in real life and in genetic algorithms. Crossover is the most important process in generating new organisms. It involves taking two gene strings (usually from two parent organisms), cutting them both at the same point, and re-splicing them so that the head of the result comes from one parent and the tail from the other. Real genetic algorithms usually generate two children in this process, and may splice at more than one point, but we'll simplify things in our implementation.
The final step, selection, involves evaluating the organisms according to some criterion (the "fitness function") and choosing the ones that are most successful. In real life, selection is the harsh process of "survival of the fittest." In a genetic algorithm, the same method is used: the least fit organisms are discarded (i.e., killed) without being allowed to reproduce. As in real life, there is some randomness, so that a somewhat unfit organism has a chance of surviving even when a more fit one is discarded. This randomness turns out to be important to the success of the method, since any two slightly unfit parents might (through crossover) generate an extremely fit child.
Because we will not have time to implement an entire genetic algorithm from scratch, much of the code has been provided for you, although you will have to clean it up. You must supply the underlying data structure (a linked list).
Makefile
and that almost all of it resided in a
single file. Worse, the code defines very few C++ classes, choosing
instead to implement most of its functionality via top-level
functions, even though there are some obvious ways in which the code
could be broken into classes. In addition, although Ginny obviously
intended to create an integer-list class to use in the program, no
trace of it exists in her home directory.
Your manager has handed you Ginny's code to clean up. You will have to:
Makefile
for the program,
IntList
class (which can store a
singly-linked list of integers) to complete the program, and
This section gives you some hints and requirements regarding the data structures that your final program will use.
Organism
Currently, the program represents an organism as a list of integers
(see below). You should maintain that
representation, but should wrap it in an Organism
class
that also provides the crossover
, mutate
,
fitness
, and intListToDouble
functions. The
class should also provide a comparison operator that can be used in
compareFitness
(the latter function must still exist so
that qsort
will work, although you may wish to convert it
into a class static
function if you know how to do so).
The program currently represents the entire colony as a simple array.
You should wrap that array in a Colony
class that
provides the naturalSelection
and findBest
functions.
IntList
An organism is represented entirely by its gene sequence, which
in turn is represented using a singlylinked list. Each element
in the list will contain only a single integer from 0 to 9
(represented by the C++ type int
), plus a link to the
next element. The list must have a separate header that is not a
plain element, which means that you must implement two classes (the
header and the element). The cleanest approach is to make the element
a nested private class of the header, so that only the header
(IntList)
is visible from outside.
You are not allowed to use a doubly linked list in this assignment.
Your linked list must be named IntList
(so that it can be
used by the main driver program) and must support
the following operations. Note that, since the main driver program is
supplied, the function names cannot be changed.
const int *
array
and an integer length for that array, and creates an
IntList
that has been initialized with the
contents of that array, with one integer per list element.
The declaration of this constructor should look like:
IntList(const int* source, int length);
pushTail
function that inserts a single
integer at the tail of the list. The declaration of this
function should be similar to the following:
void pushTail(int value);This function must operate in O(1) time, which implies that you must maintain a separate tail pointer for the list. You have already done a similar implementation in CS60.
In addition, you must implement an output operator
(operator<<
) for IntList
. I suggest
that you use the technique suggested in Weiss: provide a public
print
function, and have operator<<
call print
.
The output operator should write all the integers in the list
concatenated together, with no blanks or newlines. (This design is a
very poor approach in general, and will be changed next term. The
right way to do it would be to separate the integers with blanks
or commas.)
Finally, you may find it helpful to implement a few other
standard list functions: pushHead
, popHead
,
isEmpty
, and possibly popTail
. Several of
these functions will be useful in future assignments, and you will
find it much easier to do those assignments if you implement the
functions now, while your list class is simple, rather than waiting
until later when you have converted it into a templated class.
However, only the list above is absolutely required.
IntListIterator
You must also implement an iterator for IntList
, which
must be named IntListIterator
. The iterator must
support the following functions at a minimum:
IntList
to
be iterated over.
operator bool
that returns true
if the iterator is valid (i.e., the access function will work),
or false
if the iterator is expired.
operator++
).
operator*
that returns a
int&
(so that the integer in the current
position can be modified if necessary).
In addition, you may wish to support a copy constructor, assignment
operator, and postincrement operator. It would not be appropriate to
implement operator->
, since int
is not a
class.
You are provided with a single file, assign_07.cc
, which
is the main driver program.
As mentioned, the program is not particularly object-oriented.
Examine the code to discover the logical relationships between the
functions and then break the code up into separate files and classes.
You should create at least two new classes reflecting the logical
structure of the program. (A solution involving four new classes is
quite possible.)
In your final code, you should find that the overall code looks simpler and is easier to follow. If you have done things properly many of the functions will have fewer arguments.
Note that your code must perform exactly the same as the
existing code. This requirement means that you must take care when
making changes involving the random-number generator so that you can
be sure
the same numbers are generated. (The random-number functions used by
the code are described in the Unix manual pages drand48(3)
).
You must create or modify the following files:
assign_07.cc
Makefile
Makefile
will not
be provided. You must write your own, and it must be
correct. If you do not provide a
Makefile
, your program will not compile and
you will receive a zero for functionality. Be
sure your dependencies are correct; you may wish to use
g++ -M
to help.
intlist.hh
IntList
and IntListIterator
classes. Note that both classes must be defined
by this file, either by placing both definitions in the
file, or by having it #include
whatever
file(s) contain the remaining definitions.
*.hh
*.cc
Since assign_07.cc
is provided to you, you must maintain
stylistic consistency in that file. However, you are not required to
use any specific coding style in the
other files that you create. Since you are creating them from scratch, any
good style is acceptable. In particular, you do not have to
match the style of assign_07.cc in those files.
Emacs
users may find it helpful to invoke C-c
. stroustrup RET
to choose the Stroustrup indentation style for
assign_07.cc
.
As usual, you can download the provided file as a bundle, either as a gzipped tar file or as a ZIP archive.
static
Keyword
In the provided code, there is a file-global variable
(squaredValue
) and a function
(compareFitness
) that really should be part of either the
Colony
or Organism
class. To make that
possible, you need to use the static
C++ keyword.
Putting static
in front of a class variable says "there
is only one copy of this variable, and it is shared between all
instances of the class." In other words, the variable becomes
class-global. That's exactly what you need for
squaredValue
. You can declare it as a static
double
inside one of your classes, and it then becomes
available only inside that class -- in particular, it becomes
available to the fitness
function.
There is one minor glitch, which is that C++ requires you to add an extra declaration in one of your .cc files (wherever you implement Organism or Colony):
double Colony::squaredValue = 0.0;or
double Organism::squaredValue = 0.0;
You can also put static
in front of a function
declaration. In this case, the keyword means "this function will not
be called on a particular object." In other words, instead of writing:
Foo x; x.bar(3);you would write just:
Foo::bar(3);Usually, you need to
Foo::
to specify which function you
are calling. A static function has no this
pseudo-variable, and for that reason you can't reference any class
member variables (unless they, too, are static).
The static
function feature is perfect for
compareFitness. You can declare it inside Colony
or
Organism
as:
static int compareFitness(const void* first, const void* second);and then pass it to
qsort
as before (you may need to add
a scoping operator before the name).
For assignment 7, you must submit the following files:
Testing is your responsibility. We will not provide exact test cases for you. You should test your program a number of times, under different conditions.
In its default condition, the program is nondeterministic (i.e., two successive runs may produce different results). To make testing easier, the program accepts a switch that makes it deterministic. If you use "-S n", where n is an integer, the random seed will be set to that value. Specifying the random seed will allow you to control the program's behavior so that you can reproduce bugs.
You will also find it instructive to run the program with the
-d
switch, and to run it for many
different values of the -g
, -m
,
-p
, -r
, and -s
switches.
Judicious reading of the comments, together with experimentation, will
reveal the purpose of these switches and how they interact.
We will not limit ourselves to running only simple test cases. You can expect that we will run stress tests in an attempt to break your program. I strongly suggest that you attempt to break it yourself, so that we won't be able to do so. In particular, make sure you ask it to find the roots of a lot of numbers, all on one command line.
To make it clearer how the program is used, here are some sample runs. First, we can approximate the square root of 2000000 (which is just 1000 times the square root of 2). The "%" represents the command prompt.
% ./assign_07 -S 12345 2000000 0001415 * 0001415 = 2002225If we start with a different random seed, we get a different result:
% ./assign_07 -S 54321 2000000 0001413 * 0001413 = 1996569A third attempt gives a really bad answer:
% ./assign_07 -S 95 2000000 0000589 * 0000589 = 346921Finally, we can change the number of generations (
-g
), the
mutation rate (-m
), the population size (-p
)
the selection pool size (-s
, which should be smaller than
the population size), and the number of randomly-chosen survivors
(-r
, which should usually be pretty small), and run with
debugging (-d
):
% ./assign_07 -S 1 -g 100 -m 0.1 -p 100 -s 50 -r 3 -d 2000000 Generation 0: 0003616 Generation 1: 0001993 Generation 5: 0001912 Generation 6: 0001608 Generation 7: 0001508 Generation 8: 0001501 Generation 11: 0001412 Generation 22: 0001414 0001414 * 0001414 = 1999396
Here are several more sample runs to help you ensure that your program still runs correctly when you're working after you've finished upgrading it to object-oriented style. Your output should match exactly, including the intermediate results:
% ./assign_07 -S 1 -d 1000000 Generation 0: 0003616 Generation 3: 0002035 Generation 4: 0002032 Generation 7: 0000759 Generation 9: 0000956 Generation 12: 0000971 Generation 13: 0001006 Generation 15: 0001001 Generation 20: 0000999 Generation 21: 0001000 0001000 * 0001000 = 1000000 % ./assign_07 -S 2 -d 2000000 Generation 0: 0001959 Generation 4: 0001524 Generation 7: 0001464 Generation 8: 0001452 Generation 9: 0001412 Generation 13: 0001414 0001414 * 0001414 = 1999396 % ./assign_07 -S 3 -d 3000000 Generation 0: 0000901 Generation 5: 0001637 Generation 8: 0001781 Generation 11: 0001694 Generation 13: 0001731 Generation 34: 0001732 0001732 * 0001732 = 2999824 % ./assign_07 -S 4 -d 4000000 Generation 0: 0022736 Generation 1: 0006776 Generation 2: 0001693 Generation 5: 0001913 Generation 7: 0001999 0001999 * 0001999 = 3996001 % ./assign_07 -S 5 -d 5000000 Generation 0: 0040409 Generation 2: 0006204 Generation 5: 0001799 Generation 10: 0002515 Generation 11: 0001972 Generation 12: 0002305 Generation 13: 0002172 Generation 15: 0002176 Generation 17: 0002182 Generation 19: 0002196 Generation 20: 0002199 0002199 * 0002199 = 4835601 % ./assign_07 -S 6 -d 6000000 Generation 0: 0006410 Generation 1: 0001708 Generation 5: 0002368 Generation 9: 0002427 Generation 12: 0002471 Generation 13: 0002458 Generation 14: 0002455 Generation 16: 0002447 Generation 17: 0002450 0002450 * 0002450 = 6002500 % ./assign_07 -S 7 -d 7000000 Generation 0: 0023819 Generation 1: 0019859 Generation 2: 0019771 Generation 3: 0010017 Generation 6: 0007115 Generation 7: 0002118 Generation 12: 0002299 Generation 13: 0002761 Generation 16: 0002699 Generation 17: 0002667 Generation 18: 0002644 Generation 19: 0002645 Generation 28: 0002646 0002646 * 0002646 = 7001316 % ./assign_07 -S 8 -d 8000000 Generation 0: 0021235 Generation 1: 0021223 Generation 2: 0000630 Generation 4: 0000653 Generation 5: 0002059 Generation 6: 0003167 Generation 8: 0003132 Generation 9: 0002983 Generation 11: 0002767 Generation 13: 0002883 Generation 15: 0002783 Generation 17: 0002797 Generation 19: 0002799 0002799 * 0002799 = 7834401 % ./assign_07 -S 9 -d 9000000 Generation 0: 0006835 Generation 2: 0006814 Generation 3: 0006319 Generation 4: 0003811 Generation 5: 0001902 Generation 6: 0003052 Generation 8: 0003023 Generation 9: 0003018 Generation 11: 0003000 0003000 * 0003000 = 9000000 % ./assign_07 -S 10 -d 10000000 Generation 0: 0010722 Generation 1: 0001641 Generation 2: 0001648 Generation 5: 0003097 Generation 16: 0003099 0003099 * 0003099 = 9603801
Note 1: you can think of the running time of the program O(population size * number of generations). Don't use huge numbers or you'll wait all day! (You may want to try to analyze the complexity of the program yourself to determine whether the correct bound is different.)
Note 2: If you don't specify the -S
switch, you will get
different results every time you run the program. That's a feature,
not a bug.
Note 3: The defaults are:
As usual there are some tricky parts to this assignment. Some of them are:
assign_07.cc
before
you start, so that you understand the requirements placed on
the IntList
and IntListIterator
classes.
IntList
destructor, copy
constructor, and assignment operator are working before you
try to run the main program. Getting these functions right
can be quite difficult, and if you don't debug them in
isolation, you will experience strange bugs that will be hard
to find.
operator*
) must return an integer by reference
(int&
). Otherwise
the mutation operator won't work.
pushTail
must run in O(1) time.
Be sure to do a careful complexity analysis of the function to be sure
that it's not O(N). You will be penalized if it is not O(1).
randomInt
function. That, in turn, may turn out to be sensitive to the
number of organisms you construct. If you have trouble
matching the sample output, check to see whether you're
creating scratch organisms that cause extra calls to
randomInt
.
© 2002, Geoff Kuenning
This page is maintained by Geoff Kuenning.