Unit5 - Subjective Questions
MTH302 • Practice Questions with Detailed Answers
Define an unbiased estimator. Provide a mathematical expression to represent the condition for an estimator to be unbiased for a parameter .
An unbiased estimator is an estimator whose expected value is equal to the true value of the parameter being estimated. In other words, if we were to take many samples and calculate the estimator for each sample, the average of these estimates would converge to the true parameter value.
Mathematically, an estimator for a parameter is unbiased if:
where denotes the expected value of the estimator .
Explain the significance of unbiasedness as a desirable property for a point estimator. Why is it important in statistical inference?
Unbiasedness is a significant property for a point estimator for several reasons:
- Accuracy on Average: An unbiased estimator ensures that, on average, the estimator does not systematically over- or under-estimate the true parameter. This means that over many repeated samples, the estimates will "cluster" around the true parameter value.
- Foundation for Further Analysis: Many statistical procedures and theoretical results are built upon the assumption of unbiasedness. For instance, the Cramer-Rao Lower Bound, which is used to assess estimator efficiency, applies directly to unbiased estimators.
- Interpretability: Unbiased estimators are often more intuitive to interpret, as their expected value directly corresponds to the parameter of interest.
- Avoidance of Systematic Error: Bias represents a systematic error in the estimation process. An unbiased estimator minimizes this systematic error, leading to more reliable inferences.
Given a random sample from a population with mean and variance , prove that the sample mean is an unbiased estimator for the population mean .
To prove that the sample mean is an unbiased estimator for the population mean , we need to show that .
Given:
- is a random sample.
- for all .
The sample mean is defined as:
Now, let's find the expected value of :
Using the linearity property of expectation, and :
Since for each :
Thus, the sample mean is an unbiased estimator for the population mean .
Define a consistent estimator. Explain why consistency is considered a large-sample property.
A consistent estimator is an estimator that converges in probability to the true value of the parameter as the sample size approaches infinity. In other words, as the sample size grows, the probability that the estimator is arbitrarily close to the true parameter value approaches 1.
Mathematically, an estimator (where indicates dependence on sample size) is consistent for a parameter if for any :
or equivalently, (converges in probability).
Consistency is considered a large-sample property because its definition explicitly relies on the behavior of the estimator as the sample size tends to infinity. It doesn't guarantee good performance for small sample sizes. While an unbiased estimator might perform well on average for any sample size, a consistent estimator's desirable property (getting closer to the true value) is only guaranteed in the limit as the sample size becomes very large. Therefore, it's an asymptotic property.
Distinguish between weak consistency and strong consistency of an estimator.
Both weak and strong consistency describe the behavior of an estimator as the sample size grows. The key difference lies in the mode of convergence:
-
Weak Consistency (Convergence in Probability): An estimator is weakly consistent for a parameter if it converges in probability to . This means that for any arbitrarily small positive number , the probability that the absolute difference between the estimator and the true parameter is less than approaches 1 as the sample size approaches infinity.
Weak consistency is typically easier to prove and is often what is implied when "consistency" is mentioned without qualification. -
Strong Consistency (Convergence Almost Surely): An estimator is strongly consistent for a parameter if it converges almost surely to . This is a stronger form of convergence, implying that the sequence of estimators will converge to for almost all possible sequences of random samples (i.e., with probability 1).
Convergence almost surely implies convergence in probability. Therefore, if an estimator is strongly consistent, it is also weakly consistent, but the converse is not necessarily true.
Define an efficient estimator. What theoretical concept is used to establish the lower bound for the variance of an unbiased estimator?
An efficient estimator is an unbiased estimator that achieves the lowest possible variance among all unbiased estimators. In simpler terms, it is the "best" unbiased estimator in terms of precision, as its estimates are clustered most tightly around the true parameter value.
The theoretical concept used to establish the lower bound for the variance of an unbiased estimator is the Cramer-Rao Lower Bound (CRLB). The CRLB provides a theoretical minimum variance that any unbiased estimator of a parameter can achieve. If an unbiased estimator's variance equals the CRLB, it is said to be a minimum variance unbiased estimator (MVUE), or simply an efficient estimator (in the strict sense, if it reaches the bound for all values of the parameter).
Explain the concept of the Cramer-Rao Lower Bound (CRLB). How is it used to evaluate the efficiency of an unbiased estimator?
The Cramer-Rao Lower Bound (CRLB) is a fundamental theorem in estimation theory that provides a lower bound on the variance of any unbiased estimator of a parameter . It essentially states that no unbiased estimator can have a variance smaller than this bound.
Mathematically, for an unbiased estimator of a parameter , under certain regularity conditions, the variance is bounded by:
where is the sample size and is the Fisher Information for a single observation, defined as:
How it's used to evaluate efficiency:
- Benchmark: The CRLB serves as a benchmark for the best possible performance (lowest variance) an unbiased estimator can achieve.
- Efficiency Check: To evaluate the efficiency of a particular unbiased estimator :
- Calculate its variance, .
- Calculate the CRLB for the parameter .
- If equals the CRLB, then is an efficient estimator (specifically, a MVUE).
- Relative Efficiency: If an estimator does not achieve the CRLB, its efficiency can be quantified by its relative efficiency, often defined as the ratio of the CRLB to its actual variance: . An efficiency of 1 indicates an MVUE.
What is a Uniformly Minimum Variance Unbiased Estimator (UMVUE)? How does it relate to efficiency?
A Uniformly Minimum Variance Unbiased Estimator (UMVUE) is an unbiased estimator that has the smallest variance among all possible unbiased estimators for all possible values of the parameter being estimated. The term "uniformly" signifies that its minimum variance property holds true across the entire parameter space.
Relation to Efficiency:
- Efficiency as a goal: The pursuit of efficiency in estimation theory aims to find estimators with the lowest possible variance. The UMVUE represents the ultimate achievement in this regard for unbiased estimators.
- Cramer-Rao Lower Bound (CRLB): If an unbiased estimator's variance attains the CRLB for all values of the parameter, it is a UMVUE. Thus, the CRLB provides a criterion for identifying UMVUEs.
- Existence: A UMVUE does not always exist. Even if one exists, it might not be easy to find. Sufficient statistics play a crucial role in constructing UMVUEs (e.g., using the Lehmann-Scheffé theorem).
- Practical Importance: When a UMVUE exists, it is considered the "best" unbiased estimator because it provides the most precise estimates consistently across all scenarios of the true parameter value.
Describe the fundamental principle of Maximum Likelihood Estimation (MLE). What is the main idea behind this method?
The fundamental principle of Maximum Likelihood Estimation (MLE) is to choose as the estimate for a parameter the value that maximizes the probability (or probability density) of observing the given sample data. The main idea is that the observed data are more likely to have come from a population with certain parameter values than from a population with other parameter values.
Main Idea:
Given a random sample from a probability distribution with a parameter (or vector of parameters ), the likelihood function is defined as the joint probability density (or mass) function of the observed sample, viewed as a function of the parameter .
The MLE, denoted as , is the value of that maximizes this likelihood function. In essence, it's the parameter value that makes the observed data appear "most likely" or "most plausible".
Often, it's computationally easier to maximize the natural logarithm of the likelihood function, called the log-likelihood function, because the logarithm is a monotonically increasing function, and thus maximizing is equivalent to maximizing .
Outline the general steps involved in finding the Maximum Likelihood Estimator (MLE) for a parameter.
Finding the Maximum Likelihood Estimator (MLE) generally involves the following steps:
-
Write Down the Probability Density Function (PDF) or Probability Mass Function (PMF): Identify the PDF or PMF, , of the underlying distribution from which the sample is drawn. Ensure it's parameterized by .
-
Formulate the Likelihood Function: For a random sample , the likelihood function is the joint PDF/PMF of the sample. Assuming independence, this is the product of the individual PDFs/PMFs:
-
Formulate the Log-Likelihood Function: It's often easier to work with the natural logarithm of the likelihood function, called the log-likelihood function, :
Maximizing is equivalent to maximizing . -
Differentiate the Log-Likelihood Function: Take the first derivative of the log-likelihood function with respect to the parameter(s) and set it to zero. This is the score function.
-
Solve for the Parameter(s): Solve the equation(s) from step 4 for . The solution(s) will be the candidate MLE(s), denoted as .
-
Verify Maximization (Optional but Recommended): To ensure that the critical point found is indeed a maximum (and not a minimum or saddle point), one can check the second derivative. If the second derivative of the log-likelihood function with respect to is negative at , then it's a maximum.
Derive the Maximum Likelihood Estimator (MLE) for the parameter of a Bernoulli distribution, given a random sample .
Given a random sample from a Bernoulli distribution with parameter , where .
-
PDF/PMF of Bernoulli distribution:
-
Likelihood Function:
-
Log-Likelihood Function:
-
Differentiate and set to zero:
-
Solve for :
Let be the sample mean. So, .
-
Verify Maximization (Second derivative check):
Since , , and , both terms are negative. Thus, the second derivative is always negative, confirming that is a maximum.
Therefore, the MLE for the parameter of a Bernoulli distribution is the sample mean, .
Discuss the important asymptotic properties of Maximum Likelihood Estimators (MLEs).
Maximum Likelihood Estimators (MLEs) possess several desirable properties, especially as the sample size becomes large (asymptotic properties). These properties make MLEs very popular in statistical inference:
-
Asymptotic Unbiasedness: Under certain regularity conditions, MLEs are asymptotically unbiased. This means that while they might be biased for small sample sizes, the bias tends to zero as the sample size approaches infinity: as .
-
Consistency: MLEs are consistent. This means that as the sample size increases, the MLE converges in probability to the true value of the parameter: as .
-
Asymptotic Efficiency: MLEs are asymptotically efficient. This implies that as the sample size grows, the variance of the MLE approaches the Cramer-Rao Lower Bound (CRLB). Among all consistent and asymptotically normal estimators, the MLE has the smallest asymptotic variance.
-
Asymptotic Normality: Under regularity conditions, the MLE is asymptotically normally distributed. This means that for large sample sizes, the sampling distribution of the MLE can be approximated by a normal distribution:
or equivalently, for large .
This property is extremely valuable for constructing confidence intervals and performing hypothesis tests related to the parameter.
These asymptotic properties collectively highlight why MLE is a powerful and frequently used method, as it often yields estimators that are reliable and precise for large datasets.
Explain the invariance property of Maximum Likelihood Estimators (MLEs). Provide an example.
The invariance property (or functional invariance property) of Maximum Likelihood Estimators states that if is the MLE for a parameter , and is any one-to-one function of , then the MLE for is simply . This means that if you have the MLE for a parameter, you can find the MLE for any transformation of that parameter by applying the same transformation to the original MLE.
Example:
Suppose are independent and identically distributed (i.i.d.) random variables from an Exponential distribution with parameter . The PDF is for . The MLE for is known to be , where is the sample mean.
Now, suppose we are interested in estimating the mean of the Exponential distribution, which is .
According to the invariance property, the MLE for is simply:
Thus, the MLE for the mean of an Exponential distribution is the sample mean . This property simplifies finding MLEs for functions of parameters for which the MLE is already known.
Given a random sample from a Normal distribution with known variance and unknown mean . Derive the Maximum Likelihood Estimator (MLE) for .
Given a random sample from where is known and is unknown.
-
PDF of Normal distribution:
-
Likelihood Function:
-
Log-Likelihood Function:
-
Differentiate and set to zero (with respect to ):
Since , we can multiply by :
-
Solve for :
-
Verify Maximization (Second derivative check):
Since is always negative (as ), the second derivative is negative, confirming that is a maximum.
Therefore, the MLE for the mean of a Normal distribution with known variance is the sample mean, .
Compare and contrast the properties of an unbiased estimator and a consistent estimator.
Here's a comparison and contrast of unbiased and consistent estimators:
Unbiased Estimator:
- Definition: An estimator is unbiased if its expected value equals the true parameter value, .
- Sample Size: This property holds for any sample size . It is a finite-sample property.
- Interpretation: On average, the estimator hits the true parameter value. It does not systematically over- or under-estimate.
- Existence: An unbiased estimator may or may not exist.
- Relationship to Variance: Unbiasedness alone does not say anything about how close the estimates are to each other or to the true value. An unbiased estimator can have a very large variance.
Consistent Estimator:
- Definition: An estimator is consistent if it converges in probability to the true parameter value as the sample size approaches infinity, .
- Sample Size: This is an asymptotic or large-sample property. It describes the estimator's behavior as .
- Interpretation: As the sample size grows, the probability of the estimator being arbitrarily close to the true parameter approaches 1.
- Existence: Consistent estimators generally exist under mild conditions.
- Relationship to Variance: For consistency, it is usually required that as (along with asymptotic unbiasedness or bias going to zero).
Comparison (Similarities):
- Both are desirable properties for good estimators.
- Both aim for the estimator to be "close" to the true parameter value in some sense.
- An estimator can be both unbiased and consistent (e.g., sample mean for population mean).
Contrast (Differences):
- Small Sample Behavior: An unbiased estimator might not be consistent (e.g., if its variance does not tend to zero). A consistent estimator might be biased for small samples, but its bias diminishes asymptotically.
- Focus: Unbiasedness focuses on the expected value; consistency focuses on convergence in probability.
- Existence: While unbiased estimators might not always exist or be practical, consistent estimators are more common in practice due to their asymptotic nature.
- Importance: For small samples, unbiasedness is often prioritized. For large samples, consistency (along with asymptotic efficiency) becomes more crucial.
Explain the role of the Mean Squared Error (MSE) as a criterion for evaluating point estimators. How is it related to bias and variance?
The Mean Squared Error (MSE) is a widely used criterion for evaluating the overall performance of a point estimator. It measures the average squared difference between the estimated value and the true parameter value. A lower MSE indicates a better estimator in terms of both accuracy and precision.
Mathematically, for an estimator of a parameter , the MSE is defined as:
Relationship to Bias and Variance:
The MSE can be decomposed into two components: the variance of the estimator and the square of its bias. This decomposition reveals how both these factors contribute to the total error.
Where:
- is the variance of the estimator, which measures the precision (how spread out the estimates are around their own mean).
- is the bias of the estimator, which measures the accuracy (the systematic deviation of the estimator's expected value from the true parameter).
Implications of the Relationship:
- Trade-off: The MSE highlights a fundamental trade-off that often exists between bias and variance. It is sometimes possible to reduce the variance of an estimator by allowing a small amount of bias, leading to a smaller overall MSE (e.g., in shrinkage estimators).
- Optimal Estimator: An ideal estimator would have both zero bias (unbiased) and zero variance. If an estimator is unbiased, its MSE is simply its variance: . In this case, minimizing MSE is equivalent to minimizing variance.
- Comprehensive Evaluation: MSE provides a single measure that accounts for both the systematic error (bias) and the random error (variance) in an estimator, making it a comprehensive metric for comparing different estimators.
Under what conditions can a biased estimator be preferred over an unbiased estimator? Illustrate with an example.
While unbiasedness is a desirable property, a biased estimator can sometimes be preferred over an unbiased estimator if it has a significantly smaller variance, leading to a smaller Mean Squared Error (MSE). This situation arises when there is a bias-variance trade-off.
Recall that . If an unbiased estimator has a very large variance, its MSE might be high. A biased estimator, despite having a non-zero bias, might have a much smaller variance, such that its squared bias is outweighed by the reduction in variance, resulting in a lower overall MSE.
Conditions for preference:
- The reduction in variance of the biased estimator is substantial.
- The squared bias of the biased estimator is small relative to the reduction in variance.
- The primary goal is to minimize the total error, as measured by MSE.
Example: Sample Variance
Let be an i.i.d. sample from a population with mean and variance .
-
Biased Estimator of Variance: The sample variance is a biased estimator for , as . Its bias is .
-
Unbiased Estimator of Variance: The unbiased sample variance is . We know .
For a normal distribution, it can be shown that and .
Calculating MSE for both:
For , it turns out that is generally smaller than . For example, for , and . For , and .
Thus, even though is biased, it is often preferred in practice for estimating because it minimizes MSE, offering a better balance between bias and variance.
Consider a random sample from an Exponential distribution with unknown rate parameter . The PDF is for . Derive the Maximum Likelihood Estimator (MLE) for .
Given a random sample from an Exponential distribution with parameter .
-
PDF of Exponential distribution:
-
Likelihood Function:
-
Log-Likelihood Function:
-
Differentiate and set to zero (with respect to ):
-
Solve for :
where is the sample mean.
-
Verify Maximization (Second derivative check):
Since is always negative (as ), the second derivative is negative, confirming that is a maximum.
Therefore, the MLE for the rate parameter of an Exponential distribution is the reciprocal of the sample mean, .
What are the key differences between a Method of Moments Estimator (MME) and a Maximum Likelihood Estimator (MLE)?
Both Method of Moments Estimators (MME) and Maximum Likelihood Estimators (MLE) are techniques for finding point estimators, but they differ significantly in their approach, properties, and computational complexity.
Method of Moments Estimator (MME):
- Principle: Equates population moments (e.g., mean, variance) to corresponding sample moments and solves for the unknown parameters.
- Approach: Uses algebraic methods. Requires as many moment equations as there are parameters to estimate.
- Properties:
- Consistency: MMEs are generally consistent under fairly broad conditions.
- Unbiasedness: They are not necessarily unbiased.
- Efficiency: They are generally not efficient (do not achieve the CRLB).
- Asymptotic Normality: Often asymptotically normal, but typically with higher variance than MLEs.
- Computational Complexity: Often simpler to compute, requiring only sums and averages of data.
- Dependence on Distribution: Less dependent on the exact distribution shape, only on the moments.
Maximum Likelihood Estimator (MLE):
- Principle: Chooses the parameter values that maximize the likelihood of observing the given data.
- Approach: Uses calculus (differentiation of the likelihood/log-likelihood function) and often numerical methods for complex models.
- Properties (Asymptotic):
- Consistency: MLEs are consistent.
- Unbiasedness: Asymptotically unbiased (bias tends to zero for large samples).
- Efficiency: Asymptotically efficient (achieve the CRLB for large samples).
- Asymptotic Normality: Asymptotically normally distributed, providing a basis for inference.
- Computational Complexity: Can be computationally intensive, especially for complex likelihood functions, sometimes requiring iterative algorithms.
- Dependence on Distribution: Requires precise knowledge of the underlying distribution's PDF/PMF.
Key Differences Summarized:
- Underlying Principle: MME matches moments; MLE maximizes probability of observed data.
- Optimality: MLEs possess stronger asymptotic optimality properties (consistency, asymptotic unbiasedness, asymptotic efficiency, asymptotic normality) compared to MMEs.
- Computational: MMEs are often simpler; MLEs can be more complex.
- Distribution Dependence: MLEs are distribution-specific; MMEs are less so.
In general, MLEs are preferred in most statistical applications due to their superior asymptotic properties, despite potentially higher computational cost.
Discuss a scenario where an estimator might be consistent but biased for finite sample sizes.
An estimator can indeed be consistent but biased for finite sample sizes. This highlights that consistency is an asymptotic property and doesn't guarantee desirable behavior in small samples.
Scenario: Estimating using in a Normal distribution
Consider a random sample from a Normal distribution with unknown mean and unknown variance .
Let's use the estimator for variance:
-
Bias for finite sample sizes:
It is known that . Since for any finite , is a biased estimator for .
The bias is . -
Consistency:
To check for consistency, we typically need to show that as and as .- Bias: As , . So, it's asymptotically unbiased.
- Variance: For a normal distribution, . As , .
Since both the bias and variance tend to zero as , is a consistent estimator for .
Conclusion:
This example clearly shows that is biased for any finite sample size (it systematically underestimates ), but as the sample size grows infinitely large, its bias disappears, and its variance shrinks, making it a consistent estimator. This is a common situation for MLEs, which are often (asymptotically) unbiased, consistent, and efficient.