Unit 5 - Notes

MTH302 9 min read

Unit 5: Point Estimation

Introduction to Point Estimation

Point estimation is the process of finding a single value, or "point estimate," to serve as the best guess or approximation of an unknown population parameter (e.g., population mean $\mu$ , population proportion $p$ , population variance $\sigma^2$ ).

Parameter ( $\theta$ ): A numerical characteristic of a population. It is a fixed, unknown constant.
Estimator ( $\hat{\theta}$ ): A rule or formula, based on sample data, used to estimate the parameter. Since it is a function of random sample data, an estimator is a random variable. It has a probability distribution called a sampling distribution.
Estimate: A specific numerical value of an estimator, calculated from a particular sample.

Example:

Parameter: The true average height of all adult males in a country, $\mu$ .
Estimator: The formula for the sample mean, $\bar{X} = \frac{1}{n} \sum_{i=1}^{n} X_i$ .
Estimate: We take a random sample of 100 males and find their average height to be 175 cm. Here, 175 cm is the point estimate of $\mu$ .

Properties of Good Estimators

How do we determine if an estimator is "good"? We evaluate its statistical properties. The three most important properties are unbiasedness, consistency, and efficiency.

1. Unbiased Estimator

An estimator should, on average, give the correct value of the parameter. It should not systematically overestimate or underestimate the parameter.

Definition:
An estimator $\hat{\theta}$ is an unbiased estimator of a parameter $\theta$ if the expected value (or mean) of its sampling distribution is equal to the true value of the parameter.

$E[\hat{\theta}] = \theta$

Bias:
The bias of an estimator $\hat{\theta}$ is defined as the difference between its expected value and the true parameter.

$Bias(\hat{\theta}) = E[\hat{\theta}] - \theta$

For an unbiased estimator, the bias is zero.

Example 1: Sample Mean ( $\bar{X}$ ) for Population Mean ( $\mu$ )

Let $X_1, X_2, ..., X_n$ be a random sample from a population with mean $E[X_i] = \mu$ . The sample mean is $\bar{X} = \frac{1}{n} \sum X_i$ .

Let's check its expected value:

TEXT

E[X̄] = E[ (1/n) * ΣXᵢ ]
     = (1/n) * E[ ΣXᵢ ]         (by linearity of expectation)
     = (1/n) * ΣE[Xᵢ]           (by linearity of expectation)
     = (1/n) * Σμ               (since E[Xᵢ] = μ for all i)
     = (1/n) * (nμ)
     = μ

Since

E[\bar{X}] = \mu

, the sample mean

\bar{X}

is an unbiased estimator of the population mean

\mu

Example 2: Sample Variance ( $S^2$ ) for Population Variance ( $\sigma^2$ )

This is a critical example that explains the " $n-1$ " denominator in the sample variance formula.

Consider two potential estimators for $\sigma^2$ :

$\hat{\sigma}_n^2 = \frac{1}{n} \sum (X_i - \bar{X})^2$ (denominator $n$ )
$S^2 = \frac{1}{n-1} \sum (X_i - \bar{X})^2$ (denominator $n-1$ )

It can be shown that the expected value of $\hat{\sigma}_n^2$ is:
$E[\hat{\sigma}_n^2] = \frac{n-1}{n} \sigma^2$
Since $E[\hat{\sigma}_n^2] \neq \sigma^2$ , this estimator is biased. Specifically, it systematically underestimates the true population variance.

Now, let's examine $S^2$ :

TEXT

E[S²] = E[ (1/(n-1)) * Σ(Xᵢ - X̄)² ]
      = (1/(n-1)) * E[ Σ(Xᵢ - X̄)² ]

Using the result that

E[\sum (X_i - \bar{X})^2] = (n-1)\sigma^2

, we get:

TEXT

E[S²] = (1/(n-1)) * (n-1)σ²
      = σ²

Since

E[S^2] = \sigma^2

, the sample variance

S^2

with the denominator

n-1

is an unbiased estimator of the population variance

\sigma^2

. This is why it is the standard formula used.

2. Consistent Estimator

A good estimator should get closer to the true parameter value as the sample size increases.

Definition:
An estimator $\hat{\theta}_n$ (indexed by sample size $n$ ) is a consistent estimator of $\theta$ if, as the sample size $n$ approaches infinity, the estimator converges in probability to the true parameter value $\theta$ .

Formally: For any small number $\epsilon > 0$ ,
$\lim_{n \to \infty} P(|\hat{\theta}_n - \theta| < \epsilon) = 1$

Intuition:
As you collect more and more data, the probability that your estimator is significantly different from the true parameter becomes vanishingly small. The sampling distribution of the estimator becomes tightly concentrated around the true parameter $\theta$ .

Sufficient Conditions for Consistency:
A simpler way to check for consistency is to see if the bias and the variance of the estimator both approach zero as $n \to \infty$ .

An estimator $\hat{\theta}_n$ is consistent if:

$\lim_{n \to \infty} Bias(\hat{\theta}_n) = 0$ (The estimator is asymptotically unbiased)
$\lim_{n \to \infty} Var(\hat{\theta}_n) = 0$

Example: Sample Mean ( $\bar{X}$ ) for Population Mean ( $\mu$ )

Let's check if $\bar{X}$ is a consistent estimator for $\mu$ .

Check Bias: We already proved that $\bar{X}$ is unbiased, so $Bias(\bar{X}) = 0$ . Therefore, $\lim_{n \to \infty} Bias(\bar{X}) = 0$ .

Check Variance: For a random sample from a population with variance $\sigma^2$ , the variance of the sample mean is:

TEXT

    Var(X̄) = Var( (1/n) * ΣXᵢ )
           = (1/n²) * Var( ΣXᵢ )
           = (1/n²) * ΣVar(Xᵢ)      (since observations are independent)
           = (1/n²) * (nσ²)
           = σ²/n

Now, take the limit as

n \to \infty

\lim_{n \to \infty} Var(\bar{X}) = \lim_{n \to \infty} \frac{\sigma^2}{n} = 0

Since both conditions are met, the sample mean $\bar{X}$ is a consistent estimator of the population mean $\mu$ . This result is also known as the Weak Law of Large Numbers.

3. Efficient Estimator

If we have two different unbiased estimators for the same parameter, which one should we choose? We should choose the one with the smaller variance, as it will be more precise and more likely to be close to the true parameter value.

Relative Efficiency:
Given two unbiased estimators, $\hat{\theta}_1$ and $\hat{\theta}_2$ , for a parameter $\theta$ , the relative efficiency of $\hat{\theta}_1$ with respect to $\hat{\theta}_2$ is:
$Eff(\hat{\theta}_1, \hat{\theta}_2) = \frac{Var(\hat{\theta}_2)}{Var(\hat{\theta}_1)}$
If $Eff > 1$ , then $\hat{\theta}_1$ is more efficient than $\hat{\theta}_2$ .

Minimum Variance Unbiased Estimator (MVUE):
An unbiased estimator $\hat{\theta}^*$ is called the MVUE if its variance is less than or equal to the variance of any other unbiased estimator of $\theta$ .

The Cramér-Rao Lower Bound (CRLB):
The CRLB provides a theoretical lower limit on the variance of any unbiased estimator. It gives us a benchmark for efficiency.

$Var(\hat{\theta}) \geq \frac{1}{I(\theta)}$
where $I(\theta)$ is the Fisher Information.

The Fisher Information for a random sample of size $n$ is:
$I(\theta) = n \cdot E \left[ \left( \frac{\partial}{\partial \theta} \ln f(X; \theta) \right)^2 \right]$
where $f(X; \theta)$ is the probability density/mass function of the population.

An estimator is called efficient if it is unbiased and its variance meets the Cramér-Rao Lower Bound. That is, $Var(\hat{\theta}) = \frac{1}{I(\theta)}$ . An efficient estimator is an MVUE.

Example: Efficiency of $\bar{X}$ for the Mean $\mu$ of a Normal Distribution

Let $X_1, ..., X_n$ be a random sample from a $N(\mu, \sigma^2)$ distribution (with $\sigma^2$ known). Is $\bar{X}$ an efficient estimator for $\mu$ ?

PDF and Log-PDF:
$f(x; \mu) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$
$\ln f(x; \mu) = -\frac{1}{2}\ln(2\pi\sigma^2) - \frac{(x-\mu)^2}{2\sigma^2}$
Derivative of Log-PDF:
$\frac{\partial}{\partial \mu} \ln f(x; \mu) = 0 - \frac{2(x-\mu)(-1)}{2\sigma^2} = \frac{x-\mu}{\sigma^2}$
Calculate Fisher Information:
$I(\mu) = n \cdot E \left[ \left( \frac{x-\mu}{\sigma^2} \right)^2 \right] = \frac{n}{\sigma^4} E[(X-\mu)^2]$
By definition, $E[(X-\mu)^2]$ is the variance, $\sigma^2$ .
$I(\mu) = \frac{n}{\sigma^4} \cdot \sigma^2 = \frac{n}{\sigma^2}$
Find the CRLB:
$CRLB = \frac{1}{I(\mu)} = \frac{1}{n/\sigma^2} = \frac{\sigma^2}{n}$
Compare Estimator's Variance to CRLB:
We know that $\bar{X}$ is unbiased and $Var(\bar{X}) = \frac{\sigma^2}{n}$ .
Since $Var(\bar{X}) = CRLB$ , the sample mean $\bar{X}$ is an efficient estimator for the mean of a normal distribution.

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation is a general method for finding estimators for parameters of a distribution.

Principle:
The core idea is to find the parameter value(s) that maximize the probability (or "likelihood") of observing the actual data that was collected. We ask: "For which value of the parameter $\theta$ is our observed sample most likely to have occurred?"

Procedure:

Likelihood Function, $L(\theta)$ :
For an independent and identically distributed (i.i.d.) random sample $x_1, x_2, ..., x_n$ , the likelihood function is the joint probability of observing the data, viewed as a function of the parameter $\theta$ .
$L(\theta | x_1, ..., x_n) = \prod_{i=1}^{n} f(x_i; \theta)$
Log-Likelihood Function, $\ell(\theta)$ :
Maximizing $L(\theta)$ is equivalent to maximizing its natural logarithm, $\ln L(\theta)$ , which is usually much easier mathematically because it converts products into sums.
$\ell(\theta) = \ln L(\theta) = \sum_{i=1}^{n} \ln f(x_i; \theta)$
Maximization:
To find the maximum, take the derivative of the log-likelihood function with respect to the parameter $\theta$ , set it to zero, and solve for $\theta$ . This solution is the Maximum Likelihood Estimator, $\hat{\theta}_{MLE}$ .
$\frac{d}{d\theta} \ell(\theta) = 0$

Example 1: MLE for the parameter p of a Bernoulli Distribution

Suppose we have a sample $x_1, ..., x_n$ from a Bernoulli trial, where $x_i=1$ for a success and $x_i=0$ for a failure. The PMF is $f(x; p) = p^x (1-p)^{1-x}$ .

Likelihood Function:
$L(p) = \prod_{i=1}^{n} p^{x_i} (1-p)^{1-x_i} = p^{\sum x_i} (1-p)^{n - \sum x_i}$
Log-Likelihood Function:
$\ell(p) = \ln \left[ p^{\sum x_i} (1-p)^{n - \sum x_i} \right] = (\sum x_i) \ln(p) + (n - \sum x_i) \ln(1-p)$
Maximization:
$\frac{d\ell}{dp} = \frac{\sum x_i}{p} - \frac{n - \sum x_i}{1-p}$
Set to zero:
$\frac{\sum x_i}{p} = \frac{n - \sum x_i}{1-p}$
$(\sum x_i)(1-p) = p(n - \sum x_i)$
$\sum x_i - p\sum x_i = np - p\sum x_i$
$\sum x_i = np$
$\hat{p}_{MLE} = \frac{\sum x_i}{n} = \bar{x}$

The MLE for the population proportion $p$ is the sample proportion $\bar{x}$ .

Properties of Maximum Likelihood Estimators

MLEs are widely used because they have several desirable properties, especially for large sample sizes.

Consistency: MLEs are consistent estimators.
Asymptotic Normality: For large $n$ , the sampling distribution of an MLE is approximately normal.
Asymptotic Efficiency: For large $n$ , MLEs are efficient, meaning they achieve the Cramér-Rao Lower Bound. They are "asymptotically MVUE".
Invariance Property: If is the MLE for , then for any function , the MLE for is simply .
- Example: The MLE for the variance $\sigma^2$ of a normal distribution (when $\mu$ is known) is $\hat{\sigma}^2_{MLE} = \frac{1}{n}\sum(x_i-\mu)^2$ . By the invariance property, the MLE for the standard deviation $\sigma$ is $\hat{\sigma}_{MLE} = \sqrt{\frac{1}{n}\sum(x_i-\mu)^2}$ .

Unit 4

Unit 6

Unit 5 - Notes

Table of Contents

Unit 5: Point Estimation

Introduction to Point Estimation

Properties of Good Estimators

1. Unbiased Estimator

2. Consistent Estimator

3. Efficient Estimator

Maximum Likelihood Estimation (MLE)

Properties of Maximum Likelihood Estimators