Unit 5 - Notes
Unit 5: Point Estimation
Introduction to Point Estimation
Point estimation is the process of finding a single value, or "point estimate," to serve as the best guess or approximation of an unknown population parameter (e.g., population mean , population proportion , population variance ).
- Parameter (): A numerical characteristic of a population. It is a fixed, unknown constant.
- Estimator (): A rule or formula, based on sample data, used to estimate the parameter. Since it is a function of random sample data, an estimator is a random variable. It has a probability distribution called a sampling distribution.
- Estimate: A specific numerical value of an estimator, calculated from a particular sample.
Example:
- Parameter: The true average height of all adult males in a country, .
- Estimator: The formula for the sample mean, .
- Estimate: We take a random sample of 100 males and find their average height to be 175 cm. Here, 175 cm is the point estimate of .
Properties of Good Estimators
How do we determine if an estimator is "good"? We evaluate its statistical properties. The three most important properties are unbiasedness, consistency, and efficiency.
1. Unbiased Estimator
An estimator should, on average, give the correct value of the parameter. It should not systematically overestimate or underestimate the parameter.
Definition:
An estimator is an unbiased estimator of a parameter if the expected value (or mean) of its sampling distribution is equal to the true value of the parameter.
Bias:
The bias of an estimator is defined as the difference between its expected value and the true parameter.
For an unbiased estimator, the bias is zero.
Example 1: Sample Mean () for Population Mean ()
Let be a random sample from a population with mean . The sample mean is .
Let's check its expected value:
E[X̄] = E[ (1/n) * ΣXᵢ ]
= (1/n) * E[ ΣXᵢ ] (by linearity of expectation)
= (1/n) * ΣE[Xᵢ] (by linearity of expectation)
= (1/n) * Σμ (since E[Xᵢ] = μ for all i)
= (1/n) * (nμ)
= μ
Since , the sample mean is an unbiased estimator of the population mean .
Example 2: Sample Variance () for Population Variance ()
This is a critical example that explains the "" denominator in the sample variance formula.
Consider two potential estimators for :
- (denominator )
- (denominator )
It can be shown that the expected value of is:
Since , this estimator is biased. Specifically, it systematically underestimates the true population variance.
Now, let's examine :
E[S²] = E[ (1/(n-1)) * Σ(Xᵢ - X̄)² ]
= (1/(n-1)) * E[ Σ(Xᵢ - X̄)² ]
Using the result that , we get:
E[S²] = (1/(n-1)) * (n-1)σ²
= σ²
Since , the sample variance with the denominator is an unbiased estimator of the population variance . This is why it is the standard formula used.
2. Consistent Estimator
A good estimator should get closer to the true parameter value as the sample size increases.
Definition:
An estimator (indexed by sample size ) is a consistent estimator of if, as the sample size approaches infinity, the estimator converges in probability to the true parameter value .
Formally: For any small number ,
Intuition:
As you collect more and more data, the probability that your estimator is significantly different from the true parameter becomes vanishingly small. The sampling distribution of the estimator becomes tightly concentrated around the true parameter .
Sufficient Conditions for Consistency:
A simpler way to check for consistency is to see if the bias and the variance of the estimator both approach zero as .
An estimator is consistent if:
- (The estimator is asymptotically unbiased)
Example: Sample Mean () for Population Mean ()
Let's check if is a consistent estimator for .
-
Check Bias: We already proved that is unbiased, so . Therefore, .
-
Check Variance: For a random sample from a population with variance , the variance of the sample mean is:
TEXTVar(X̄) = Var( (1/n) * ΣXᵢ ) = (1/n²) * Var( ΣXᵢ ) = (1/n²) * ΣVar(Xᵢ) (since observations are independent) = (1/n²) * (nσ²) = σ²/n
Now, take the limit as :
Since both conditions are met, the sample mean is a consistent estimator of the population mean . This result is also known as the Weak Law of Large Numbers.
3. Efficient Estimator
If we have two different unbiased estimators for the same parameter, which one should we choose? We should choose the one with the smaller variance, as it will be more precise and more likely to be close to the true parameter value.
Relative Efficiency:
Given two unbiased estimators, and , for a parameter , the relative efficiency of with respect to is:
If , then is more efficient than .
Minimum Variance Unbiased Estimator (MVUE):
An unbiased estimator is called the MVUE if its variance is less than or equal to the variance of any other unbiased estimator of .
The Cramér-Rao Lower Bound (CRLB):
The CRLB provides a theoretical lower limit on the variance of any unbiased estimator. It gives us a benchmark for efficiency.
where is the Fisher Information.
The Fisher Information for a random sample of size is:
where is the probability density/mass function of the population.
An estimator is called efficient if it is unbiased and its variance meets the Cramér-Rao Lower Bound. That is, . An efficient estimator is an MVUE.
Example: Efficiency of for the Mean of a Normal Distribution
Let be a random sample from a distribution (with known). Is an efficient estimator for ?
-
PDF and Log-PDF:
-
Derivative of Log-PDF:
-
Calculate Fisher Information:
By definition, is the variance, .
-
Find the CRLB:
-
Compare Estimator's Variance to CRLB:
We know that is unbiased and .
Since , the sample mean is an efficient estimator for the mean of a normal distribution.
Maximum Likelihood Estimation (MLE)
Maximum Likelihood Estimation is a general method for finding estimators for parameters of a distribution.
Principle:
The core idea is to find the parameter value(s) that maximize the probability (or "likelihood") of observing the actual data that was collected. We ask: "For which value of the parameter is our observed sample most likely to have occurred?"
Procedure:
-
Likelihood Function, :
For an independent and identically distributed (i.i.d.) random sample , the likelihood function is the joint probability of observing the data, viewed as a function of the parameter .
-
Log-Likelihood Function, :
Maximizing is equivalent to maximizing its natural logarithm, , which is usually much easier mathematically because it converts products into sums.
-
Maximization:
To find the maximum, take the derivative of the log-likelihood function with respect to the parameter , set it to zero, and solve for . This solution is the Maximum Likelihood Estimator, .
Example 1: MLE for the parameter p of a Bernoulli Distribution
Suppose we have a sample from a Bernoulli trial, where for a success and for a failure. The PMF is .
-
Likelihood Function:
-
Log-Likelihood Function:
-
Maximization:
Set to zero:
The MLE for the population proportion is the sample proportion .
Properties of Maximum Likelihood Estimators
MLEs are widely used because they have several desirable properties, especially for large sample sizes.
- Consistency: MLEs are consistent estimators.
- Asymptotic Normality: For large , the sampling distribution of an MLE is approximately normal.
- Asymptotic Efficiency: For large , MLEs are efficient, meaning they achieve the Cramér-Rao Lower Bound. They are "asymptotically MVUE".
- Invariance Property: If is the MLE for , then for any function , the MLE for is simply .
- Example: The MLE for the variance of a normal distribution (when is known) is . By the invariance property, the MLE for the standard deviation is .