CS 147 Homework Assignment 2
This homework assignment is due at 12 AM on Thursday, February 16,
2012 (i.e., the Wednesday/Thursday boundary). Please give your
solutions to me, slide them under my door, or e-mail them.
I expect that it will take you about 3 hours to complete the
assignment.
If you use a Microsoft product to do your graphing, be sure to turn
off the stupid gray background, and ensure that color isn't essential
for interpreting the graphs (since I might decide to print things on a
B&W printer).
You are encouraged to either use a standard software tool or to write
code to help solve these problems, so that you will have techniques
you can use again in the future.
- The sizes (in bytes) of a set of HTML files
are given in
prob1.txt.
- What are the 1st and 3rd quartiles for this data?
- Are quartiles a good choice to describe dispersion in this
case? If not, what would you use instead?
- What are the mean file size and the standard
deviation?
- What are the 90% and 95% confidence intervals for the
mean?
- Is your calculation valid? Why or why not?
- What is the 90% confidence interval for the
proportion of files that are less than 20,000 bytes in
size? Use the formula given in Jain.
- Is the confidence interval for the proportion valid?
- At 90% confidence, is the mean file size greater than
16K (16384) bytes?
- The raw midterm scores for two sections of a class are
given in prob2-1.txt and prob2-2.txt.
- Is either section better than the other at 90%
confidence? Which?
- Is either section better at 80% confidence?
- Based on the data from the combined sections, how
many students would have to take the midterm if we
wanted the mean score to have a 99% confidence
interval that was +/- 5% of the mean?
- Correct timekeeping is very important in navigation.
Traditionally, seafarers try to never reset their
clocks; instead they calculate a daily drift rate and apply a
correction factor. The file prob3.txt
contains a series of observations of the number of days since
a particular wristwatch was first started (first column) and the
number of seconds of error exhibited by the watch relative to
an atomic clock (second column). The columns are
tab-separated, so they should be easy to import into a
spreadsheet.
- Fit a linear regression to this data.
- Which of the fitted parameters are significant at 95%
confidence?
- How much of the variation is explained by the
regression?
- Based on the R-squared value, is the regression
valid? (Show your calculations.)
- Using visual tests, verify or refute the validity of your
regression model according to the four criteria listed
on pages 235-237 of the textbook.
- What would the clock error be at day 730.000?
- At the equator, a 4-second clock error will
produce a navigation error of exactly one nautical mile. On
day 730, a navigator crossing the equator uses your
regression model to
correct the clock reading, then calculates her position.
At 90% confidence, what is the plus/minus error
introduced by the watch after the correction has been made,
expressed in nautical miles?
© 2012, Geoff Kuenning
This page is maintained by Geoff
Kuenning.