1What is the primary purpose of a scatter plot in statistics?
scatter plots
Easy
A.To show the relationship between two quantitative variables
B.To compare parts of a whole
C.To display the frequency distribution of a single variable
D.To show data over a period of time
Correct Answer: To show the relationship between two quantitative variables
Explanation:
A scatter plot uses dots to represent values for two different numeric variables, making it ideal for visualizing the relationship or association between them.
Incorrect! Try again.
2If the points on a scatter plot generally form a pattern from the lower-left to the upper-right, what type of relationship does this suggest?
scatter plots
Easy
A.A negative correlation
B.A positive correlation
C.No correlation
D.A non-linear relationship
Correct Answer: A positive correlation
Explanation:
A pattern moving from lower-left to upper-right indicates that as one variable increases, the other variable also tends to increase. This is known as a positive correlation.
Incorrect! Try again.
3What does a scatter plot with points scattered randomly, showing no clear direction or pattern, suggest about the variables?
scatter plots
Easy
A.Little to no correlation
B.A strong negative correlation
C.A strong positive correlation
D.A perfect correlation
Correct Answer: Little to no correlation
Explanation:
When points are randomly scattered on a plot without any discernible pattern, it implies that there is no clear linear relationship or association between the two variables.
Incorrect! Try again.
4The value of a correlation coefficient, denoted by , always lies between which two values?
correlation coefficient and its properties
Easy
A.-1 and +1
B.-1 and 0
C.0 and 1
D.0 and 100
Correct Answer: -1 and +1
Explanation:
The correlation coefficient has a range from -1 (perfect negative linear correlation) to +1 (perfect positive linear correlation), inclusive.
Incorrect! Try again.
5A correlation coefficient of indicates which of the following?
correlation coefficient and its properties
Easy
A.No linear relationship
B.A perfect positive linear relationship
C.A weak positive linear relationship
D.A perfect negative linear relationship
Correct Answer: A perfect positive linear relationship
Explanation:
A value of +1.0 signifies a perfect positive linear relationship. This means as one variable increases, the other increases in a perfectly predictable straight line.
Incorrect! Try again.
6If the correlation coefficient between two variables is 0, what does this imply?
correlation coefficient and its properties
Easy
A.There is a strong negative relationship.
B.The variables are perfectly correlated.
C.There is a causal relationship.
D.There is no linear relationship between the variables.
Correct Answer: There is no linear relationship between the variables.
Explanation:
A correlation coefficient of 0 indicates the absence of a linear relationship. However, it does not rule out the possibility of a non-linear relationship between the variables.
Incorrect! Try again.
7How is the correlation coefficient between two variables, and , affected if the units of measurement for both variables are changed (e.g., from meters to centimeters)?
correlation coefficient and its properties
Easy
A.It increases
B.It becomes zero
C.It decreases
D.It remains unchanged
Correct Answer: It remains unchanged
Explanation:
The correlation coefficient is a unit-free measure. It is independent of the change of origin and scale of the variables, so changing units does not alter its value.
Incorrect! Try again.
8If two variables have a strong correlation, does this mean one variable causes the other to change?
correlation coefficient and its properties
Easy
A.Yes, but only if the correlation is positive.
B.Yes, a strong correlation always implies causation.
C.Not necessarily, as correlation does not imply causation.
D.Yes, but only if the correlation is negative.
Correct Answer: Not necessarily, as correlation does not imply causation.
Explanation:
Correlation measures the strength and direction of a relationship, but it does not prove that one variable causes the change in the other. A third, unobserved variable could be influencing both.
Incorrect! Try again.
9Karl Pearson's correlation coefficient is best suited for measuring the relationship between which type of variables?
Karl Pearson’s correlation coefficient
Easy
A.One quantitative and one categorical variable
B.Two categorical variables
C.Two quantitative variables
D.Two ranked variables
Correct Answer: Two quantitative variables
Explanation:
Karl Pearson's coefficient is designed to measure the strength and direction of the linear relationship between two continuous or quantitative variables.
Incorrect! Try again.
10What is another common name for Karl Pearson's correlation coefficient?
Karl Pearson's correlation coefficient is frequently referred to as the Pearson product-moment correlation coefficient (PPMCC).
Incorrect! Try again.
11The sign (positive or negative) of Karl Pearson's correlation coefficient indicates the...
Karl Pearson’s correlation coefficient
Easy
A.Strength of the relationship
B.Cause of the relationship
C.Direction of the relationship
D.Significance of the relationship
Correct Answer: Direction of the relationship
Explanation:
A positive sign indicates a positive linear relationship (as one variable increases, so does the other), while a negative sign indicates a negative linear relationship (as one variable increases, the other decreases).
Incorrect! Try again.
12A key assumption for interpreting Karl Pearson's correlation coefficient is that the relationship between the variables is...
Karl Pearson’s correlation coefficient
Easy
A.Exponential
B.Linear
C.Logarithmic
D.Curvilinear
Correct Answer: Linear
Explanation:
Pearson's correlation coefficient specifically measures the strength and direction of a linear relationship. If the relationship is non-linear, this coefficient may not be an accurate measure of the association.
Incorrect! Try again.
13Spearman's rank correlation coefficient is used to measure the strength of association between...
Spearman’s rank correlation coefficient
Easy
A.Two independent samples
B.Two nominal variables
C.Two means
D.Two ranked variables
Correct Answer: Two ranked variables
Explanation:
Spearman's rank correlation is a non-parametric test that is used to measure the degree of association between two variables that are measured on an ordinal (rank) scale.
Incorrect! Try again.
14What is the first step in the process of calculating Spearman's rank correlation coefficient for a set of data?
Spearman’s rank correlation coefficient
Easy
A.Rank the values for each variable separately
B.Create a scatter plot of the data
C.Calculate the mean of each variable
D.Find the difference between the values
Correct Answer: Rank the values for each variable separately
Explanation:
To calculate Spearman's rho, the raw data for each of the two variables must first be converted into ranks, typically from lowest to highest.
Incorrect! Try again.
15Unlike Pearson's correlation which assesses linear relationships, Spearman's rank correlation assesses what type of relationship?
Spearman’s rank correlation coefficient
Easy
A.Monotonic relationship
B.Causal relationship
C.Random relationship
D.Exponential relationship
Correct Answer: Monotonic relationship
Explanation:
A monotonic relationship is one where the variables tend to move in the same relative direction, but not necessarily at a constant rate. Spearman's correlation measures the strength of this monotonic association.
Incorrect! Try again.
16In which of the following situations would Spearman's rank correlation be more appropriate to use than Pearson's correlation?
Spearman’s rank correlation coefficient
Easy
A.When the data is ordinal (ranked)
B.When the data is perfectly linear
C.When the sample size is very small
D.When the data is categorical with no order
Correct Answer: When the data is ordinal (ranked)
Explanation:
Spearman's correlation is specifically designed for ordinal (ranked) data. It is also a good choice for quantitative data that has a non-linear but monotonic relationship.
Incorrect! Try again.
17What is the primary purpose of simple linear regression?
Linear regression and its properties
Easy
A.To find the average of a dataset
B.To classify data into different groups
C.To model the relationship between a dependent variable and an independent variable
D.To determine if two variables are correlated
Correct Answer: To model the relationship between a dependent variable and an independent variable
Explanation:
Linear regression aims to find the best-fitting straight line that can be used to predict the value of a dependent variable based on the value of an independent variable.
Incorrect! Try again.
18In the simple linear regression equation, , what does the coefficient represent?
Linear regression and its properties
Easy
A.The correlation coefficient
B.The slope of the regression line
C.The y-intercept of the regression line
D.The predicted value of y
Correct Answer: The slope of the regression line
Explanation:
The coefficient is the slope. It represents the estimated change in the dependent variable for every one-unit increase in the independent variable .
Incorrect! Try again.
19The "line of best fit" in a linear regression model is the line that...
Linear regression and its properties
Easy
A.Has the steepest possible slope
B.Minimizes the sum of the squared vertical distances of the points from the line
C.Passes through the maximum number of data points
D.Connects the first and last data points
Correct Answer: Minimizes the sum of the squared vertical distances of the points from the line
Explanation:
The method of least squares is used to find the line of best fit. This method works by minimizing the sum of the squared differences (residuals) between the observed values and the values predicted by the line.
Incorrect! Try again.
20In linear regression, what is a "residual"?
Linear regression and its properties
Easy
A.The value of the independent variable
B.The difference between the observed value and the predicted value
C.The slope of the regression line
D.The difference between two predicted values
Correct Answer: The difference between the observed value and the predicted value
Explanation:
A residual is the error in a prediction for a single data point. It is calculated as the actual observed value () minus the value predicted by the regression line ().
Incorrect! Try again.
21If the covariance between two variables X and Y is 15, the variance of X is 25, and the variance of Y is 9, what is the Karl Pearson’s correlation coefficient?
Karl Pearson’s correlation coefficient
Medium
A.0.5
B.1.0
C.0.75
D.1.25
Correct Answer: 1.0
Explanation:
The formula for Karl Pearson’s correlation coefficient (r) is . Here, , so . Also, , so . Plugging in the values: .
Incorrect! Try again.
22In which of the following scenarios would Spearman's rank correlation be more appropriate than Pearson's correlation coefficient?
Spearman’s rank correlation coefficient
Medium
A.When the data is perfectly normally distributed.
B.When the relationship between variables is monotonic but not linear.
C.When the sample size is very large.
D.When we want to measure the strength of a linear relationship only.
Correct Answer: When the relationship between variables is monotonic but not linear.
Explanation:
Pearson's coefficient measures the strength of a linear relationship. Spearman's coefficient measures the strength of a monotonic relationship (one that is consistently increasing or decreasing). It is suitable for non-linear but monotonic relationships and for ordinal data.
Incorrect! Try again.
23If the regression line of Y on X is given by , and the mean of X is , what is the mean of Y, ?
Linear regression and its properties
Medium
A.5
B.8
C.11
D.Cannot be determined
Correct Answer: 11
Explanation:
A fundamental property of the linear regression line is that it always passes through the point of means, . Therefore, the means must satisfy the regression equation. Substituting into the equation gives .
Incorrect! Try again.
24A scatter plot of variable Y versus variable X shows points that form a clear U-shape. What can you conclude about the Pearson correlation coefficient (r) for this data?
scatter plots
Medium
A.r will be close to -1
B.r will be close to 0
C.r will be undefined
D.r will be close to +1
Correct Answer: r will be close to 0
Explanation:
The Pearson correlation coefficient (r) measures the strength and direction of a linear relationship. A U-shaped pattern indicates a strong non-linear relationship, but since the initial decrease is cancelled out by the subsequent increase, there is no overall linear trend. Thus, r will be close to 0.
Incorrect! Try again.
25If the correlation coefficient between two variables, height (in meters) and weight (in kg), is 0.8, what will be the correlation coefficient if the height is measured in centimeters and weight is measured in grams?
correlation coefficient and its properties
Medium
A.Cannot be determined without the data
B.0.008
C.80
D.0.8
Correct Answer: 0.8
Explanation:
The correlation coefficient is a dimensionless quantity, meaning it is independent of the units of measurement. Changing the scale of the variables (e.g., from meters to centimeters) by multiplying by a positive constant does not change the value of the correlation coefficient.
Incorrect! Try again.
26A regression analysis of student scores (Y) versus hours studied (X) yielded the equation . What is the correct interpretation of the slope?
Linear regression and its properties
Medium
A.A student who studies for 0 hours is predicted to get a score of 5.
B.The average score for all students is 55.
C.For every 5 additional hours studied, the student's score is predicted to increase by 50 points.
D.For each additional hour studied, the student's score is predicted to increase by 5 points.
Correct Answer: For each additional hour studied, the student's score is predicted to increase by 5 points.
Explanation:
The slope of the regression line (in this case, 5) represents the predicted change in the dependent variable (Y) for a one-unit increase in the independent variable (X). Therefore, for each extra hour of study, the score is expected to increase by 5 points.
Incorrect! Try again.
27For a set of paired data (X, Y), the ranks are as follows: and . What is the Spearman's rank correlation coefficient?
Spearman’s rank correlation coefficient
Medium
A.1.0
B.-1.0
C.-0.5
D.0.0
Correct Answer: -1.0
Explanation:
The ranks for Y are in the exact reverse order of the ranks for X. This represents a perfect negative monotonic relationship. Therefore, the Spearman's rank correlation coefficient must be -1.0 without calculation. Alternatively, are (-2, 0, 2). . Using the formula , we get .
Incorrect! Try again.
28If the two regression coefficients are and , what is the correlation coefficient between X and Y?
Karl Pearson’s correlation coefficient
Medium
A.0.6
B.0.36
C.-0.36
D.-0.6
Correct Answer: -0.6
Explanation:
The correlation coefficient is the geometric mean of the two regression coefficients, and . The formula is . The sign of must be the same as the sign of the regression coefficients. Since both are negative, must be negative. .
Incorrect! Try again.
29If the correlation coefficient between X and Y is 0.7, what percentage of the variation in Y is explained by the linear relationship with X?
correlation coefficient and its properties
Medium
A.30%
B.49%
C.7%
D.70%
Correct Answer: 49%
Explanation:
The percentage of variation in Y explained by the linear relationship with X is given by the coefficient of determination, which is . If , then . Expressed as a percentage, this is 49%.
Incorrect! Try again.
30Given the regression line of Y on X as , the mean of X values is and the mean of Y values is . If we were to calculate the regression line of X on Y, which of the following points is guaranteed to be on that line?
Linear regression and its properties
Medium
A.(4, 1.5)
B.(10, 19)
C.(1.5, 4)
D.(19, 10)
Correct Answer: (19, 10)
Explanation:
Both regression lines, Y on X and X on Y, must pass through the point of means . For the regression of Y on X, the point is . For the regression of X on Y, the variables are swapped, so the line must pass through the point , which is (19, 10).
Incorrect! Try again.
31For the data points (1, 2), (2, 4), (3, 6), (4, 8), what is the value of Karl Pearson's correlation coefficient?
Karl Pearson’s correlation coefficient
Medium
A.0
B.-1
C.1
D.0.5
Correct Answer: 1
Explanation:
The data points all lie on a perfect straight line with a positive slope, specifically . When there is a perfect positive linear relationship between two variables, the Karl Pearson's correlation coefficient is +1.
Incorrect! Try again.
32If two variables X and Y have a correlation coefficient of , which statement is most accurate?
correlation coefficient and its properties
Medium
A.There is a weak negative linear relationship between X and Y.
B.X causes Y to decrease.
C.90% of the data points lie on the regression line.
D.There is a strong negative linear relationship between X and Y.
Correct Answer: There is a strong negative linear relationship between X and Y.
Explanation:
A correlation coefficient close to -1 indicates a strong negative linear relationship. It does not imply causation. The coefficient of determination means 81% of the variance is explained, not that 90% of points are on the line.
Incorrect! Try again.
33Calculate Spearman's rank correlation for the following data on two judges' scores: Judge A: (10, 12, 11), Judge B: (15, 18, 16).
Spearman’s rank correlation coefficient
Medium
A.0.5
B.0.0
C.1.0
D.-1.0
Correct Answer: 1.0
Explanation:
The correct option follows directly from the given concept and definitions.
Incorrect! Try again.
34The regression equation of Y on X is . The slope is calculated as . If , , and , what is the value of the slope ?
Linear regression and its properties
Medium
A.2.0
B.1.0
C.0.5
D.4.0
Correct Answer: 1.0
Explanation:
The formula for the slope of the regression line of Y on X is . Plugging in the given values: .
Incorrect! Try again.
35A scatter plot shows a cloud of points that is wide at low values of X and narrow at high values of X, with a general downward trend. This pattern is known as:
scatter plots
Medium
A.Homoscedasticity
B.Autocorrelation
C.Multicollinearity
D.Heteroscedasticity
Correct Answer: Heteroscedasticity
Explanation:
Heteroscedasticity refers to the situation where the variability (scatter) of a variable is unequal across the range of values of a second variable that predicts it. In this case, the spread of Y changes as X changes, which violates an assumption of standard linear regression.
Incorrect! Try again.
36If two variables, X and Y, are statistically independent, what is the expected value of their Pearson correlation coefficient?
correlation coefficient and its properties
Medium
A.-1
B.1
C.It depends on the distribution
D.0
Correct Answer: 0
Explanation:
If two variables are independent, their covariance is 0. Since the correlation coefficient is calculated by dividing the covariance by the product of the standard deviations, a covariance of 0 will result in a correlation coefficient of 0. However, the converse is not always true; a correlation of 0 does not necessarily imply independence.
Incorrect! Try again.
37Given a regression equation , what is the predicted value of y when x = 10 and the actual observed value was y = 75?
Linear regression and its properties
Medium
A.80
B.120
C.20
D.75
Correct Answer: 80
Explanation:
The question asks for the predicted value, which is found by substituting into the regression equation. . The actual value of 75 would be used to calculate the residual (error), which would be .
Incorrect! Try again.
38For the following paired data (X, Y): (5, 8), (10, 15), (15, 12), (20, 18), what is the sum of squared differences in ranks, , used to calculate Spearman's correlation?
Spearman’s rank correlation coefficient
Medium
A.4
B.2
C.0
D.1
Correct Answer: 2
Explanation:
The correct option follows directly from the given concept and definitions.
Incorrect! Try again.
39If we add 5 to every X value and subtract 10 from every Y value in a dataset, how will the Karl Pearson's correlation coefficient (r) change?
Karl Pearson’s correlation coefficient
Medium
A.It will become 0.
B.It will decrease.
C.It will not change.
D.It will increase.
Correct Answer: It will not change.
Explanation:
The Pearson correlation coefficient is invariant to changes of origin (adding or subtracting a constant) and scale (multiplying or dividing by a positive constant). Shifting the data by adding or subtracting a constant from all values does not alter the strength or direction of the linear relationship.
Incorrect! Try again.
40If the correlation coefficient 'r' is 0, which of the following statements is true?
correlation coefficient and its properties
Medium
A.There is no relationship of any kind between the variables.
B.The slope of the regression line is undefined.
C.There is no linear relationship between the variables.
D.The variables are independent.
Correct Answer: There is no linear relationship between the variables.
Explanation:
A correlation coefficient of 0 specifically indicates the absence of a linear relationship. It does not rule out the possibility of a strong non-linear relationship (e.g., a parabolic relationship). While independence implies r=0, r=0 does not imply independence.
Incorrect! Try again.
41For a set of data points where , if the relationship is given by for values symmetrically distributed around 0 (e.g., ), what will be the value of the Karl Pearson’s correlation coefficient ?
Karl Pearson’s correlation coefficient
Hard
A.
B.
C.
D.
Correct Answer:
Explanation:
Pearson's measures the strength of a linear relationship. Here, the relationship is perfectly quadratic. For values symmetric around 0, an increase in does not consistently lead to an increase or decrease in . For , as increases, decreases. For , as increases, also increases. The positive and negative linear components cancel each other out, resulting in a covariance of 0, and thus a correlation coefficient of 0.
Incorrect! Try again.
42A simple linear regression model is fitted, yielding the equation . If the independent variable is rescaled to and the dependent variable is rescaled to , what will be the new regression equation for in terms of ?
Linear regression and its properties
Hard
A.
B.
C.
D.
Correct Answer:
Explanation:
Original equation: . We have and . Substituting these into the original equation: . This simplifies to . Multiplying the entire equation by 5 gives . So the new intercept is and the new slope is .
Incorrect! Try again.
43Consider a dataset where the variables and have a perfect monotonic, but non-linear relationship, such as . Which of the following statements about the Karl Pearson's correlation coefficient () and Spearman's rank correlation coefficient () is most likely to be true?
Spearman’s rank correlation coefficient
Hard
A. while or
B. and or
C.
D.
Correct Answer: and or
Explanation:
Spearman's measures the strength of a monotonic relationship. Since is a perfect monotonic increasing function, the ranks of will be in the same order as the ranks of , making . Karl Pearson's measures the strength of a linear relationship. Since is curved, it is not perfectly linear, so its value will be less than 1. Therefore, .
Incorrect! Try again.
44Let the correlation coefficient between two variables and be . Two new variables are defined as and . What is the correlation coefficient between and ?
correlation coefficient and its properties
Hard
A.-0.8
B.-1.2
C.0.8
D.Cannot be determined
Correct Answer: -0.8
Explanation:
The correlation coefficient is independent of changes in origin and scale. However, it is sensitive to the sign of the scaling factor. is a linear transformation of with a negative slope (-3), and is a linear transformation of with a positive slope (2). Because one of the slopes is negative, the sign of the correlation coefficient is reversed. Therefore, .
Incorrect! Try again.
45In a simple linear regression model , let be the residuals. Which of the following statements is mathematically guaranteed to be false for any OLS regression with an intercept?
Linear regression and its properties
Hard
A.The correlation between the residuals and the observed values is zero.
B.The sum of the squared residuals, , is minimized.
C.The correlation between the residuals and the predicted values is zero.
D.The correlation between the residuals and the independent variable is zero.
Correct Answer: The correlation between the residuals and the observed values is zero.
Explanation:
In OLS, the residuals are uncorrelated with the predictors and the fitted values. However, since , the covariance . As long as there is some error, is positive, so the correlation between the residuals and the observed values is non-zero.
Incorrect! Try again.
46The equations of two regression lines are and . What is the Karl Pearson correlation coefficient between and ?
Karl Pearson’s correlation coefficient
Hard
A.-3/4
B.4/3
C.-4/3
D.3/4
Correct Answer: -3/4
Explanation:
Rearrange the lines into regression forms. Assume line 1 is x on y: , so . Assume line 2 is y on x: , so . The coefficient of determination is . Since both regression coefficients are negative, must also be negative. Thus, . The other identification of lines would lead to , which is impossible.
Incorrect! Try again.
47A scatter plot for 100 data points shows a weak positive correlation (). A new data point is added at , where is significantly larger than all other values (an x-outlier), and falls exactly on the regression line calculated from the original 100 points. How will this new point, known as a 'high leverage' point, most likely affect the correlation coefficient ?
scatter plots
Hard
A.It will have very little effect on .
B.It will push closer to 0.
C.It will significantly increase towards 1.
D.It will significantly decrease towards -1.
Correct Answer: It will significantly increase towards 1.
Explanation:
A high leverage point is an observation with an extreme value for the independent variable. When this point also falls close to the existing regression line, it doesn't change the slope much, but it greatly reduces the relative sum of squared errors. By adding a distant point that confirms the trend, it pulls the overall correlation coefficient significantly closer to 1 (or -1 if the trend is negative), strengthening the evidence for a linear relationship.
Incorrect! Try again.
48In a dataset of 10 pairs, the ranks for variable X are and the ranks for variable Y are . The standard formula for Spearman's correlation is . What is the primary issue with using this specific formula here?
Spearman’s rank correlation coefficient
Hard
A.The sample size is too small for this formula.
B.The formula is only valid for positive correlation.
C.The presence of tied ranks requires a correction factor, making this formula inaccurate.
D.The formula requires data to be normally distributed.
Correct Answer: The presence of tied ranks requires a correction factor, making this formula inaccurate.
Explanation:
The standard, simplified formula for Spearman's correlation assumes there are no ties in the ranks. When ties are present, as seen in the ranks for variable X (3.5 and 7.5), this formula is no longer exact. A more complex formula involving correction factors for ties should be used, or alternatively, one can calculate the Pearson correlation on the rank values, which naturally handles ties correctly.
Incorrect! Try again.
49Two different simple linear regression models are fitted. Model A has a correlation coefficient . Model B has a correlation coefficient . The total sum of squares () is the same for both datasets. Which statement is correct about the sum of squared residuals () for the two models?
Linear regression and its properties
Hard
A.The relationship cannot be determined.
B.
C.
D.
Correct Answer:
Explanation:
The coefficient of determination is . For Model A, . For Model B, . represents the proportion of variance explained, and is also given by . Since Model B has a higher , it explains more variance, meaning it has a smaller proportion of unexplained variance (). As is the same for both, must be smaller for Model B. Therefore, .
Incorrect! Try again.
50For three variables , , and , it is known that the correlation between and is , and the correlation between and is . What can be concluded about the minimum possible value for the correlation between and , ?
correlation coefficient and its properties
Hard
A. must be at least 0.
B. must be at least 0.81.
C. can be as low as -1.
D. must be at least 0.62.
Correct Answer: must be at least 0.62.
Explanation:
The correlation matrix of the three variables must be positive semi-definite. This implies the inequality . Plugging in the values gives , which simplifies to . This quadratic inequality holds for between the roots of the corresponding equation, which are 0.62 and 1. Thus, the minimum possible value for is 0.62.
Incorrect! Try again.
51You are given four datasets that have nearly identical summary statistics: mean of X, mean of Y, variance of X, variance of Y, and Karl Pearson's correlation coefficient (). However, their scatter plots are drastically different. One plot is a clear linear relationship, another is a perfect non-linear relationship, a third has a major outlier, and a fourth has a high leverage point. What is the most important conclusion from this scenario?
Karl Pearson’s correlation coefficient
Hard
A.Visualizing data using scatter plots is a critical step before interpreting correlation or fitting a regression model.
B.Karl Pearson's correlation coefficient is robust to outliers and non-linearity.
C.A high correlation coefficient () always guarantees a useful linear model.
D.Summary statistics, including correlation, are sufficient to understand the relationship between two variables.
Correct Answer: Visualizing data using scatter plots is a critical step before interpreting correlation or fitting a regression model.
Explanation:
This scenario describes Anscombe's Quartet. The key takeaway is that summary statistics alone can be extremely misleading. Drastically different data distributions can produce identical statistical properties. It highlights the absolute necessity of visualizing data to understand its underlying structure, check for outliers, and assess the appropriateness of a linear model, which a single correlation value cannot confirm.
Incorrect! Try again.
52In a regression analysis, an observation is identified that has a low leverage value but a very large residual. How would you classify this point and describe its likely effect on the regression line?
Linear regression and its properties
Hard
A.It is a typical data point and will have minimal effect on the regression.
B.It is an outlier, but likely not an influential point; it will increase the standard error but may not significantly change the slope.
C.It is an influential point that will drastically change the slope of the regression line.
D.It is a high-leverage point that will strongly pull the regression line towards it.
Correct Answer: It is an outlier, but likely not an influential point; it will increase the standard error but may not significantly change the slope.
Explanation:
Leverage measures how far an observation's x-value is from the mean of the x-values. A low leverage means its x-value is typical. A large residual means the point is a vertical outlier. A point is influential if its removal causes a large change in the model. Points with low leverage, even if they are outliers, are typically not influential because they don't have enough 'lever arm' to drastically change the slope. Their primary effect is to inflate measures of error.
Incorrect! Try again.
53For a dataset of size , the ranks of variable X are and the ranks of variable Y are . Calculate the Spearman's rank correlation coefficient .
Spearman’s rank correlation coefficient
Hard
A.0.929
B.1
C.0.952
D.0.976
Correct Answer: 0.929
Explanation:
The correct option follows directly from the given concept and definitions.
Incorrect! Try again.
54A researcher studies the relationship between hours studied and exam scores for two different subjects, A and B. For subject A, the correlation is . For subject B, the correlation is . When the two datasets are combined, what can be said about the correlation of the aggregate data?
Karl Pearson’s correlation coefficient
Hard
A. can be negative, positive, or zero, and is not constrained to be between 0.7 and 0.8.
B. must be positive.
C. must be greater than 0.7.
D. must be between 0.7 and 0.8.
Correct Answer: can be negative, positive, or zero, and is not constrained to be between 0.7 and 0.8.
Explanation:
This scenario is an example related to Simpson's Paradox. When combining groups, the overall correlation can be drastically different from the correlation within each group. Differences in the means of the variables between the groups can create an overall trend that is different from, or even opposite to, the trend within the individual groups. Therefore, the combined correlation is not constrained.
Incorrect! Try again.
55A scatter plot of residuals versus predicted values for a linear regression model shows a fan or cone shape, where the vertical spread of the residuals increases as the predicted values increase. What is the primary implication of this pattern?
scatter plots
Hard
A.The assumption of constant variance (homoscedasticity) is violated.
B.The relationship between the variables is non-linear.
C.There are significant outliers in the dataset.
D.The independence of errors assumption is violated.
Correct Answer: The assumption of constant variance (homoscedasticity) is violated.
Explanation:
This fan or cone shape in a residual plot is the classic sign of heteroscedasticity, which means the variance of the error terms is not constant across all levels of the independent variable. The linear regression model assumes homoscedasticity (constant variance). This violation suggests that the model's predictions are less reliable for certain ranges of the predictor variable.
Incorrect! Try again.
56An observational study finds a strong positive correlation () between the number of firefighters at a fire and the amount of damage caused by the fire. The conclusion drawn is that sending more firefighters causes more damage. Which statistical concept best explains the flaw in this conclusion?
correlation coefficient and its properties
Hard
A.Non-linearity in the relationship.
B.The ecological fallacy.
C.The effect of outliers.
D.Spurious correlation due to a confounding variable.
Correct Answer: Spurious correlation due to a confounding variable.
Explanation:
The flawed conclusion ignores a confounding variable: the size of the fire. Larger fires require more firefighters and also inherently cause more damage. The size of the fire is the common cause for both variables. The observed correlation is therefore spurious because it is induced by this third variable, not because of a causal link between firefighters and damage.
Incorrect! Try again.
57A student calculates summary statistics from a sample of 10 data pairs: , , and . They then calculate the Pearson correlation coefficient . What is the most likely reason for their result?
Karl Pearson’s correlation coefficient
Hard
A.The data contains significant outliers.
B.The relationship is strongly non-linear.
C.The sample size is too small.
D.There must be a calculation error in one of the summary statistics.
Correct Answer: There must be a calculation error in one of the summary statistics.
Explanation:
Calculating the correlation coefficient from the given statistics: . By mathematical definition (due to the Cauchy-Schwarz inequality), the Pearson correlation coefficient must lie in the interval . A value of $1.1$ is impossible. While outliers or non-linearity can affect the value of , they can never push it outside this valid range. The only possible explanation is a mistake in the calculation of the summary statistics.
Incorrect! Try again.
58Consider a simple linear regression model where the intercept is forced to be zero (), often called regression through the origin. Which property of the residuals from a standard OLS regression (with an intercept) is NOT guaranteed to hold for this model?
Linear regression and its properties
Hard
A.The sum of the residuals, , is zero.
B.The regression line passes through the point .
C.The sum of squared residuals is minimized under the constraint that the line passes through the origin.
D.The slope is calculated as .
Correct Answer: The sum of the residuals, , is zero.
Explanation:
In a standard OLS regression with an intercept, one of the normal equations ensures that . However, when the model is forced through the origin, this constraint is removed. The model only minimizes , which does not require the sum of residuals to be zero. In fact, will generally not be zero unless the regression line happens to pass through the point of means .
Incorrect! Try again.
59A dataset for variables and follows a perfect parabolic relationship for values from 0 to 10. What would you expect the Spearman's rank correlation coefficient () to be, approximately?
Spearman’s rank correlation coefficient
Hard
A.Close to -1
B.Close to 0
C.Approximately 0.5
D.Close to 1
Correct Answer: Close to 0
Explanation:
Spearman's measures the strength of a monotonic relationship. The described relationship is a symmetric parabola. As increases from 0 to 5, increases (a positive monotonic relationship). As increases from 5 to 10, decreases (a negative monotonic relationship). Because the relationship is perfectly symmetric and non-monotonic over the full range, the positive and negative rank associations cancel each other out, leading to a Spearman's correlation coefficient close to 0.
Incorrect! Try again.
60What is the value of the correlation coefficient between a variable (with non-zero variance) and a constant ?
correlation coefficient and its properties
Hard
A.1
B.Undefined
C.-1
D.0
Correct Answer: Undefined
Explanation:
The formula for the correlation coefficient is . If is a constant , its variance is 0, and therefore its standard deviation is also 0. Since the denominator of the correlation formula includes , calculating it would involve division by zero. Therefore, the correlation coefficient is undefined. Correlation measures how two variables co-vary; a variable cannot co-vary with a constant.