Sample ConcepTest Questions for Statistics
Distributions
Q1: Which statement about bar charts and histograms is false?
1) Histograms have no spaces between the bars.
2) Histograms show how a quantitative variable is distributed while bar charts show counts (or percents) for different categories of a categorical variable.
3) Histograms are just a special kind of bar chart.
4) Histograms and bar charts are the same thing.
5) Histograms always have a numerical scale on the x-axis.
Answer (4)
Q2: True or false: The following histograms could all be from the same dataset.
1) True
2) False
Answer (1): All the histograms are from the same sample of size 100 from a Normal population with mean 50 and standard deviation 15. The point is that the histograms can look quite different for different interval sizes.
Q3: True or false: The following histograms could all be from the same dataset.
1) True
2) False
Answer (1): All the histograms are from the same sample consisting of 50 observations from a normal population with mean 37 and standard deviation 10 and 50 observations from a normal population with mean 63 and standard deviation 10. The point is that the shape (modality) of the distribution can change with interval size.
Q4: Could all of the samples below be from the same population?
1) Yes
2) No
Answer (1): The sample sizes are 20, 50, 100, 500, 1000, 10000, all drawn from a standard normal population. The point is that very large samples are necessary for the histogram to look like the population distribution.
Q5: Match the histograms to the boxplots.
1) A1, B2, C3
2) A1, B3, C2
3) A2, B1, C3
4) A2, B3, C1
5) A3, B1, C2
6) A3, B2, C1
Answer (4)
Q6: Match the boxplots below with the following conditions:
a) median < mean
b) median = mean
c) median > mean
1) 1a, 2b, 3c
2) 1c, 2b, 3a
3) 1b, 2a, 3c
4) 1b, 2c, 3b
Answer (3):
Normal Distributions
Q1: Changing the standard deviation of a normal distribution changes the
1) center.
2) spread.
3) area.
Answer (2): Shown below are three normal distributions with mean zero and standard deviations of 1/2, 1, and 2.
Q2: Match the histograms to the normal quantile plots.
1) A1, B2, C3
2) A1, B3, C2
3) A2, B1, C3
4) A2, B3, C1
5) A3, B1, C2
6) A3, B2, C1
Answer (5)
Q3: True or false: All four samples shown below could have been drawn from the same normally distributed population.
1) True
2) False
Answer (1): All four samples are of size 100 and drawn randomly from a normally distributed population with mean 50 and standard deviation 15.
Scatterplots and Correlation
Q1: For which of the following studies would making a scatterplot be appropriate?
a) Height and handspan
b) Height and gender
c) Height and age
d) Height and country of birth
1) a only
2) a and c
3) b only
4) b and d
Q2: Match the four scatterplots with the four correlations:
a) r=0.95, b) r=0.41, c) r=0.58, d) r=0.98
1) Aa, Bb, Cc, Dd
2) Aa, Bc, Cb, Dd
3) Ad, Bb, Cc, Da
4) Ad, Bc, Cb, Da
Answer (2)
Least-Squares Regression
Q1: The ability to estimate distances can be tested by estimating distances (in meters) to various landmarks and then measuring the actual distances with a laser rangefinder. If we want to use a linear regression equation to predict the true distance to a landmark using our estimated distance then
1) estimated distance is the explanatory variable
2) measured distance is the explanatory variable
Answer (1): We want to predict the actual distance, so actual (measured) distance is the response variable.
Q2: One individual's regression equation was
Measured distance=17.15+0.518 Estimated distance
For larger distances he tended to
1) underestimate the measured distance
2) overestimate the measured distance
Answer (2): Estimated distances are being multiplied by a slope less than one, so the estimates are too big. See the plot below. What about for smaller distances? Is there an influential observation?
Q3: The residual plot for the distance estimation experiment is shown below. From the plot
1) the residuals look normally distributed about 0.
2) the plot shows a curvilinear relationship between the variables.
3) the variation in the residuals is increasing with distance.
Answer (3): Predicted distances are likely to be less accurate as estimated distance increases. The residuals are not normally distributed because of the outliers. See the normal quantile plot below.
Q4: From the scatterplot below do you think it is reasonable to model the data with a straight line?
1) Yes
2) No
3) Hard to tell
Q5: Now look at the fit and the residual plot. Do you think it is reasonable to model the data with a straight line?
1) Yes
2) No
3) Hard to tell
Answer (2): The residual plot shows nonlinearity.
Q6: For a statistics class the linear regression analysis on handspan versus height reported
=0.21. This means
1) 21% of a person's handspan measurement is explained by their height.
2) 21% of the variation in the handspan measurements can be explained by the linear regression on height.
3) height can be used to predict handspan with 21% accuracy.
4) 21% of the students have handspans that can be predicted by their height.
Cautions about Correlation and Regression
Q1: True or false: An influential point need not necessarily be an outlier, and an outlier need not be an influential point.
1) True
2) False
Answer (1): Scatterplot A shows an outlier that is not (too) influential (r with the point is -0.94 and r without is -0.99). Scatterplot B shows an influential point that is not an outlier (r with is -0.91 and r without is -0.82). From a practical viewpoint all outliers will have some influence.
Design of Experiments
Q1: True or false: A longitudinal study is a type of observational study.
1) True
2) False
Answer (F): A longitudinal study could be either an observational study or an experiment. Can you give an example of each?
Q2: A golf ball manufacturer wants to test the effect of two different golf ball designs (A and B) on driving distance. Which of the following two experimental designs is best?
1) For 100 golfers randomly assign 50 to hit drives with ball A and have the other 50 hit the other ball. Compare the difference in average distance for each group.
2) Have each golfer hit both balls. Use random assignment to determine which type of ball they hit first. Calculate the difference in distance (ball A minus ball B) for each golfer and average these differences.
Answer (2): This is a matched-pair design. Why is it better?
Sampling Design
Q1: True or false: To reliably estimate a population parameter larger populations require larger samples. (Assume that the population is much larger than the sample size.)
1) True
2) False
Randomness
Q1: Your favorite weather forecaster gives a 70% chance of rain for tomorrow.
True or false: The prediction for tomorrow is rain.
1) True
2) False
Answer (T): This answer is not really satisfactory, because it is overly simplistic. This question could be used to start a discussion. Is any prediction greater than 50% a prediction for rain? What does a 70% chance of rain really mean? (That in the long run it will rain on 70% of the days when such a forecast is made.) How would you measure the accuracy of forecasts? (See "The Weather Forecaster Problem", Dan Teague & Dot Doyle, COMAP Consortium, 90, Spring/Summer 2006, p. 14)
Q2: True or false: The risk of death for a 30 year old and 70 year old are both the same.
1) True
2) False
Answer (1): The probability is one for both. The time frame should be specified, for example "risk of death in the next year."
Probability Rules
Q1: There are 26 people in our class. Do you think the probability that two of us share the same birthday is about
1) 1/10000
2) 1/1000
3) 1/100
4) 1/10
5) 1/2
Answer (5): Surprisingly there only needs to be 23 people in the class for the probability to be greater than 0.5. One minus the probability that everyone has different birthdays is 1-![]()
= 0.598.
Q2: Suppose we roll a pair of fair dice. Let C= event that the sum of the dice is 7. P(C) equals
1) 5/36
2) 1/6
3) 7/36
4) 1/3
Answer (2): There are 6 ways to roll a 7 out of 36 possible outcomes.
| {1,1} | {1,2} | {1,3} | {1,4} | {1,5} | {1,6} |
| {2,1} | {2,2} | {2,3} | {2,4} | {2,5} | {2,6} |
| {3,1} | {3,2} | {3,3} | {3,4} | {3,5} | {3,6} |
| {4,1} | {4,2} | {4,3} | {4,4} | {4,5} | {4,6} |
| {5,1} | {5,2} | {5,3} | {5,4} | {5,5} | {5,6} |
| {6,1} | {6,2} | {6,3} | {6,4} | {6,5} | {6,6} |
Q3: Event A is the number on each die is even. Event B is the number on the first die is the same as the number on the second die.
P(A and B)=
1) 3/36
2) 6/36
3) 9/36
4) I'm not awake yet and therefore have no idea.
Answer (1): Count the possible ways.
Q4: P(
or
)=
1) 27/36
2) 30/36
3) 24/36
4) 33/36
Answer (4): Make a Venn diagram.
Q5: P(B)=
1) P(B and A)+P(B and
)
2) P(A)P(B|A)+P(
)P(B|
)
3) Both 1 and 2
4) None of the above
Q6: You are a contestant on a game show. There are three doors. Behind one of the doors is a new car and behind each of the other two doors is a goat. You pick a door, and you will win the prize behind that door. However, you get a second chance to guess. Say you pick door #1. The host of the show now tells you that behind door #3 is a goat. You are given the choice of remaining with door #1 or switching your choice to door #2. Your strategy should be
1) remain with door #1.
2) switch to door #2.
3) either choice since the probabilities are the same.
Answer (2): See http://www.maa.org/devlin/devlin_07_03.html
Q7: In the tree diagram below, what probability does (
)(
) represent?
1) P(A)P(B)
2) P(A or B)
3) P(A and B)
4) None of the above
Q8: True or false: In the tree diagram below x equals
.
1) True
2) False
Linear Combinations of Random Variables
Q1: The data {1,3,5,7,9} has
=5. If each number in the data set is multiplied by 2 and then increased by 3 the
for this new data set will be
1) 5
2) 8
3) 10
4) 13
Sampling Distributions for Counts (Binomial) and Proportions
Q1: Let X =number of successes have binomial distribution with n=15 and p=0.9. If Y =number of failures then Y has a binomial distribution with n=15 and p=0.1.
P(X≤13)=
1) P(X=0)+P(X=1)+...+P(X=13)
2) 1-P(X>13)
3) 1-P(Y<2)
4) 1-P(Y=0)-P(Y=1)
5) All of the above
Answer (5)
Q2: Let X have a binomial distribution with n=30 and p=0.4. This distribution along with a normal distribution with μ=n p and σ=
are shown below. If we want to approximate the probability that X≤10 using the normal distribution we should calculate the z-score using
1) X=9
2) X=9.5
3) X=10
4) X=10.5
5) X=11
Answer (4): (Continuity correction) Using X=10.5 will most closely approximate the area of the yellow bars. The exact solution is 0.291 while the approximations corresponding to the five choices are {0.132, 0.176, 0.228, 0.288, 0.355}.
Q3: If n p<10 then P(x=k) can be calculated using
1) (
)
n
k
![]()
2) a table of binomial probabilities
3) 1 or 2
4) the normal approximation N(n p,
)
5) 1, 2, or 4
Q4: True or false: There is a relationship between proportions and means.
1) True
2) False
Answer (1): Code a "success" as 1 and a "failure" as 0. For a sample of size n,
=X/n. What is
?
=![]()
![]()
=
=
Sampling Distribution of a Sample Mean (C.L.T.)
Q1: Is it possible for a normally distributed variable and the sampling distribution of its mean to have the graphs shown below?
1) Yes
2) No
Q2: Is it possible for a normally distributed variable and the sampling distribution of its mean to have the graphs shown below?
1) Yes
2) No
Q3: Is it possible for a normally distributed variable and the sampling distribution of its mean to have the graphs shown below?
1) Yes
2) No
Q4: The distribution of math SAT scores for male students in 2005 was approximately normal with a mean of 538 and a standard deviation of 116. To calculate the probability that the average score of a simple random sample of 20 male students is 550 or higher we calculate the z-score as
1) ![]()
2) ![]()
3) Neither formula is correct
Confidence Intervals for the Mean
Q1: For the standard normal distribution P(-1.96<Z<1.96)=0.95 so approximately 95% of the observed values of Z are in the interval (-1.96,1.96) for a large sample. For a sample mean,
, there is a 95% probability that
-1.96<
<1.96
Multiplying this expression by
gives
-1.96
<
-μ<1.96 ![]()
True or False: -1.96
<μ-
<1.96 ![]()
1) True
2) False
Answer (T): This result is obtained either by multiplying by -1 or recognizing that |
-μ| is the same as |μ-
|. Continue by adding
to the expression in the question to show that the interval (
-1.96
,
+1.96
) has a 95% chance of capturing μ.
Q2: True or False: The following statements about 95% confidence intervals are equivalent.
a) There is a 95% chance that μ is in the confidence interval.
b) There is a 95% chance that the confidence interval contains μ.
1) True
2) False
Answer (2): Two different samples will likely give two different confidence intervals. How can there be a 95% chance that μ is in each?
Hypotheses Tests for the Mean
Q1: True or false: The p-value is the probability that the null hypothesis is true.
1) True
2) False
Q2: Sample size _________ affects whether the results of a hypothesis test are significant or not.
1) never
2) sometimes
3) often
4) always
Answer (4): Significance will always be achieved if sample sizes are large enough. This is why it is important to calculate confidence intervals if the variable is quantitative.
Q3: True or false: If a result is not statistically significant then the null hypothesis must be true.
1) True
2) False
Answer (2): The particular sample did not provide strong evidence against the null hypothesis. It could still be false.
Comparing Two Means
Q1: Is the mean height of Beloit College female students less than male students? Let
= the mean height of female Beloit College students and
= the mean height for male students. What are the appropriate null and alternative hypotheses?
1)
:
-
=0 and
:
-
≠0
2)
:
-
≠0 and
:
-
=0
3)
:
-
=0 and
:
-
<0
4)
:
-
=0 and
:
-
>0
Answer (3)
Q2: The p-value for the problem in the previous question was <0.004. Pick the best concluding sentence:
1) The data provide very strong evidence that the difference in heights between female and male Beloit College students is less than zero.
2) The data provide very strong evidence that female students are shorter than male students at Beloit College.
3) The data provide very strong evidence that the average height of female students is less than the height of male students.
4) The data provide very strong evidence that the average Beloit College woman is shorter than the average Beloit College man.
5) The data provide very strong evidence that the average height of female Beloit College students is less than the average height of male Beloit College students.
Answer (5): The answer needs to specifically contain the word average or mean and must indicate clearly the population. The logical next question would be "How much shorter?" Thus a test like this one should be accompanied by a confidence interval.
| Created by Mathematica (August 14, 2006) |