CS 147 Homework Assignment 3
This homework assignment is due at 12 AM on Wednesday, February 29,
2012 (i.e., the Wednesday/Thursday boundary). Please e-mail your
solutions to me or slide them under my office door.
I expect that it will take you about 3 hours to complete the
assignment. Please record your actual time so I can get feedback on
my estimates.
If you use a Microsoft product to do your graphing, be sure to turn
off the stupid
gray background.
You are encouraged to either use a standard software tool or to write
code to help solve these problems, so that you will have techniques
you can use again in the future. Note that Microsoft Excel has an
F-test built in. If you use it, you should first do one of the small
examples in the book to make sure you get the same result to verify
that you know how to use the function properly.
-
A Harvey Mudd professor with too much time on his hands
decided to investigate the relationship between dorm
assignments, time spent studying, time spent sleeping, and
GPA. He collected data on 495
students.
Each line of the (tab-separated) file has four fields:
- Dorm group: either "quad" or "outer".
- Number of hours per week spent studying, as reported by
the student.
- Average number of (self-reported) hours of sleep the
student got each night.
- The student's GPA for the semester, as reported by the
registrar.
You are to perform a multiple regression analysis on the data,
answering the following questions:
- What is the formula for GPA in terms of dorm, study hours,
and sleep hours?
- What is the R-squared value for this regression?
- What are the 95% confidence intervals for each of the four
regression parameters?
- What are the SSR and the SSE? What is the result of the
F-test at the 95% level?
- Is there any correlation between the dorm group and the study
hours?
- Is there any correlation between the study hours and the
sleep hours?
- Based on your answers to the two previous questions, should
you modify your regression analysis? If so, what is the new
regression equation, including 95% confidence intervals?
- Based on a scatter plot of the regression errors, do you
see any trends in the data?
- Do you think your regression analysis is valid for this data?
-
A researcher collected a number of observations relating the
memory size of a program (independent variable) to its run time.
The data is given in prob3-2.txt, where
the first column is the memory size in KB (1K = 1000, not 1024),
and the second is the
run time in seconds. Using regression techniques, fit an equation
to this data. Answer the following questions:
- What is the regression equation you chose?
- Is your regression valid?
- Are the regression parameters significant?
- Are your errors normally distributed?
- What can you say about the 95% confidence intervals?
- What is the predicted run time for a 10-MB program?
- What is the predicted run time for a 100-MB program?
- How valid are the predictions in the previous two
questions?
© 2012, Geoff Kuenning
This page is maintained by Geoff
Kuenning.