CS 147 Homework Assignment 2
This homework assignment is due at 12 AM on Thursday, February 20,
2003 (i.e., the Wednesday/Thursday boundary). Please give your
solutions to me or place them in the box outside my door.
I expect that it will take you about 3 hours to complete the
assignment.
If you use a Microsoft product to do your graphing, be sure to turn
off all colors (so you can print in black and white) and the stupid
gray background.
You are encouraged to either use a standard software tool or to write
code to help solve these problems, so that you will have techniques
you can use again in the future.
- The sizes (in bytes) of the HTML files for the CS70 homework
problems from the fall of 2002 are given in
prob1.txt.
- What are the 1st and 3rd quartiles for this data?
- Are quartiles a good choice to describe dispersion in this
case? If not, what would you use instead?
- What are the mean file size and the standard
deviation?
- What are the 90% and 95% confidence intervals for the
mean?
- Is your calculation valid? Why or why not?
- What is the 90% confidence interval for the
proportion of files that are less than 20,000 bytes in
size? Use the formula given in Jain.
- Is the confidence interval for the proportion valid?
- At 90% confidence, is the mean file size greater than
16K (16384) bytes?
- The raw midterm scores for two sections of CS70 students are
given in prob2-1.txt and prob2-2.txt.
- Is either section better than the other at 90%
confidence? Which?
- Is either section better at 80% confidence?
- Based on the data from the combined sections, how
many students would have to take the midterm if we
wanted the mean score to have a 99% confidence
interval that was +/- 5% of the mean?
- Correct timekeeping is very important in navigation.
Traditionally, seafarers never try to reset their
clocks; instead they calculate a daily drift rate and apply a
correction factor. The file prob3.txt
contains a series of observations of the number of days since
a particular wristwatch was first started (first column) and the
number of seconds of error exhibited by the watch relative to
an atomic clock (second column). The columns are
tab-separated, so they should be easy to import into a
spreadsheet.
- Fit a linear regression to this data.
- Which of the fitted parameters are significant at 95%
confidence?
- How much of the variation is explained by the
regression?
- Based on the R-squared value, is the regression valid?
- Using visual tests, verify or refute the validity of your
regression model according to the four criteria listed
on pages 235-237 of the textbook.
- What would the clock error be at day 730.000?
- At the equator, a 4-second clock error will
produce a navigation error of exactly one nautical mile. On
day 730, a navigator crossing the equator uses your
regression model to
correct the clock reading, then calculates her position.
At 90% confidence, what is the plus/minus error
introduced by the watch after the correction has been made,
expressed in nautical miles?
© 2003, Geoff Kuenning
This page is maintained by Geoff
Kuenning.