ML Seminar Outline: Spring 2005
Instructor: Belinda Thom (bthom@cs.hmc.edu, 1241 Olin Hall, 7-9662, schedule)
TA: Aaron Arvey (aarvey@cs.hmc.edu)
Course Mailing List: cs-182-1-l@hmc.edu
Class Time and Location:Mon and Wed, 4-5:30 PM, PA-1285
Lab Time and Location:Wed, 6:30-8:00 PM (when no Nelson is scheduled), B-105
Quick Links:
Overview
This course is not being run as an ordinary class but rather as a
seminar. You should view this course as a semester-long research
opportunity.
In this class you will learn as much as you can about data analysis and modeling
by empirically investigating a problem of your own choosing, either in
a group with one other student or on your own. (I strongly encourage but do not
require that students work in pairs.) Your grade in this course will be
determined by the quality of the attention you give to your chosen
research problem. This effort will culminate in a final report and
presentation. Although positive research results are encouraged, they are by no
means required for doing well in this class---being able to clearly understand
and document why aspects of your approach were not successful are equally
valuable from a grading (and I would argue, a scientific!) perspective.
My role as instructor and seminar leader
- To provide you with a broad and accessible view of machine learning, so
that you can knowledge-ably choose a research problem.
- To introduce you to specific technical details, most notably probabilistic
reasoning, in order to provide some context in which to base your investigation of
existing machine learning literature, data sets, and algorithms.
- To provide you with guidance and mentoring as you explore your proposed
research task. I view this as my most important function, for this will:
- Help keep you focused and directed.
- Ensure that you don't "bite off more than you can chew."
- Provide you with background on an as-need basis.
- Foster enthusiasm and incentive.
I will realize the first two roles in class. I will achieve the third via
regularly-scheduled weekly research meetings and feedback via the
course Wiki.
Course map
Part I
In the first part of the course, I will lecture, assign readings, and lead
directed discussions. Your primary responsibility, in return, is carefully
reading the assigned readings and actively participating in class discussion.
An approximate schedule of what I plan to cover in lecture and what readings
will be assigned are available in the tentative syllabus (on the Wiki).
During most lectures, I will hand out one or two informal
think-about-outside-of-class problems. Occasionally, mini Matlab programming
tasks might also be requested. These activities will be designed to help you
internalize the material---by yourself or with your classmates. Discussion and
solutions to these problems will be discussed at the beginning of the next
lecture. As long as everyone spends some quality time outside of class on these
activities, they will not be required (i.e. you won't have to turn anything in).
Part II
In the second part of the course, you will run the lectures. Each week, a
different student (or pair of students) will present aspects of their research,
followed by an in-depth class discussion. In this venue, my role is merely to
facilitate.
As the course progresses, a time-line for student presentations will be agreed upon.
Required Work
Abstracts
Each student will write a brief abstract for each reading that
is assigned. This will be due at the beginning of the class in which the
paper will be discussed. You should post your abstract onto the course Wiki
before class. Your abstract should include:
- A succinct one to two paragraph write-up summarizing what you have
interpreted the article to have been about.
- One or two big issues raised by the paper that you find worthy of
discussion.
- One or two technical details about the paper you didn't entirely understand.
To guide you in this process, I will provide an exemplary abstract or two.
Research Project
Most of your effort in this class will be spent investigating the exploration
and modeling of data sets of your own choosing. You may either work in a group
with one other student or on your own. This activity will be referred to as
your research project. In the remainder of this document, when
discussing any aspect of your research project, the you refers to
either a single student (if you are working alone) or pair of students,
e.g. pairs of students will submit a joint report, will attend research meetings
together, will give a joint presentation, etc.
Weekly Research Meeting and Wiki Entries
You will schedule a half-hour time block. During this time, we'll meet in my
office (1241 Olin, x7-6992) to discuss your activities that week and your plans
for remaining work. In order to document your work efforts. you will also
maintain one (or more) topics on the course Wiki. I will check-in the Wiki and
provide feedback on it regularly.
Class Presentations
During the second half of the semester, you will present the work that you have
been doing with your classmates. Students taking clinic will do their presentations
first to avoid conflicts with that class.
Presentations should be about one half-hour in length, leaving plenty of time
for intense class discussion. Belinda will guide you in drafting this
presentation. No more than 5 to 10 slides will be allowed---the focus should be
on big-ideas and compelling and pointed discussion about your approach.
Final Project
At the end of the semester, you should submit a 10 to 25 page report summarizing
the result of your work (students working in groups can submit a joint
report). This report should emphasize clarity at the high-level first---What is
your project about? Why does it interest you? What specific questions are you
attempting to answer with empirical simulation? What approaches and techniques
are you using to answer these questions? What assumptions are you making (and
why)?
In addition, about a third of your report should delve into precisely
explaining the details of a particular algorithm, evaluation mechanism, or
experimental setup. In this section, you should rely on the formalisms you
learned about in class whenever possible. The purpose of this part of the
assignment is to give you experience writing up a technical exposition in
detail.
Final reports must be turned in no later than May 4th. Clinic students might be
given an earlier deadline.
Homework Sets and Exams
Aside from the work outline above, I don't expect to assign additional required
work for this course. In particular, as long as the class remains adequately
engaged in the class material, I will not give any exams, quizzes, or official
homework sets. Rather, I expect you to spend most of your out-of-class time
working on your research project.
Text
There will be no official text for this course. Lectures will for the most
part be self-contained. Relevant papers and snippets from relevant texts will be
provided as needed.
For those of you that are interested, I recommend the following reference texts:
- Mitchell: Machine Learning
- Duda, Hart and Stork: Pattern Classification
- Chris Bishop: Neural Networks for Pattern Recognition
- MacKay: Information Theory, Inference, and Learning Algorithms
- Ross: Probability Models
- Spiegel: Probability and Statistics (Shaum's Outline)
- Hsu: Probability, Random Variables, and Random Processes (Shaum's Outline)
- Manning and Schutze: Foundations of Statistical Natural Language Processing
- Russell and Norvig: Artificial Intelligence, A Modern Approach
Which book(s) would be best for you depends on who you are and what your
needs are. I'd be happy to help you make a more informed decision about which
books to buy, just stop by. Any of these classics would most certainly
make a great addition to your technical collection.
Grades
Your grade in this class will be based on the following criteria:
- Active class participation and attention to assigned readings (1/4).
- Your research project, which will be evaluated via:
- Your weekly participation, e.g. research meetings, Wiki (1/4).
- Your class presentation (1/4).
- Your final report (1/4).
These criteria will be roughly weighted as indicated in bold.