Unit 4 - Notes

MTH302 9 min read

Unit 4: Special Continuous Distributions

1. Exponential Distribution

The exponential distribution is used to model the time until a certain event occurs, such as the time until a part fails or the time between arrivals at a service counter. It is closely related to the Poisson process.

Definition and PDF/CDF

A continuous random variable X follows an exponential distribution with a rate parameter λ > 0 if its probability density function (PDF) is given by:

LATEX
f(x; \lambda) = 
\begin{cases} 
\lambda e^{-\lambda x} & \text{for } x \ge 0 \\
0 & \text{for } x < 0 
\end{cases}

  • Parameter λ (lambda): The rate parameter, representing the average number of events per unit of time.
  • Parameter β = 1/λ: An alternative parameterization where β is the scale parameter, representing the mean waiting time. The PDF can be written as f(x; β) = (1/β) * e^(-x/β).

The cumulative distribution function (CDF) gives the probability that the event has occurred by time x:

LATEX
F(x; \lambda) = P(X \le x) = 
\begin{cases} 
1 - e^{-\lambda x} & \text{for } x \ge 0 \\
0 & \text{for } x < 0 
\end{cases}

The probability that the waiting time is greater than x is called the survival function:

LATEX
P(X > x) = 1 - F(x) = e^{-\lambda x}

Key Properties

Memoryless Property: This is the most crucial property of the exponential distribution. It states that the probability of an event occurring in a future interval is independent of how much time has already passed. Mathematically:

LATEX
P(X > s + t | X > s) = P(X > t) \quad \text{for all } s, t \ge 0

  • Intuition: If a lightbulb has an exponential lifetime distribution and has already worked for 100 hours, the probability that it will work for another 50 hours is the same as the probability that a brand new bulb will work for 50 hours. The past has no bearing on its future lifetime.

Mean, Variance, and MGF

For a random variable X following an exponential distribution with rate parameter λ:

  • Mean (Expected Value):

    LATEX
        E[X] = \mu = \frac{1}{\lambda}
        

    The average waiting time is the reciprocal of the rate. If events occur at a rate of 2 per hour (λ=2), the average waiting time between events is 1/2 hour.

  • Variance:

    LATEX
        Var(X) = \sigma^2 = \frac{1}{\lambda^2}
        

  • Standard Deviation:

    LATEX
        SD(X) = \sigma = \frac{1}{\lambda}
        

    Note that for the exponential distribution, the mean and the standard deviation are equal.

  • Moment Generating Function (MGF): (without proof)

    LATEX
        M_X(t) = E[e^{tX}] = \frac{\lambda}{\lambda - t} \quad \text{for } t < \lambda
        

Applications

  • Modeling the time until the next customer arrives in a queue.
  • Modeling the lifetime of electronic components that do not "age".
  • Modeling the time between radioactive decay events.

2. Gamma Distribution

The gamma distribution is a flexible two-parameter family of continuous probability distributions. It is a generalization of the exponential distribution and can be used to model the waiting time until a specified number of events occur.

The Gamma Function

The gamma distribution is defined using the gamma function, Γ(α), which is a generalization of the factorial function to non-integer values.

LATEX
\Gamma(\alpha) = \int_{0}^{\infty} x^{\alpha-1}e^{-x} \,dx \quad \text{for } \alpha > 0

  • Property: Γ(α) = (α-1)Γ(α-1)
  • For integer n: Γ(n) = (n-1)!
  • Special value: Γ(1/2) = √π

Definition and PDF

A continuous random variable X follows a gamma distribution with a shape parameter α > 0 and a rate parameter β > 0 if its PDF is:

LATEX
f(x; \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x} \quad \text{for } x \ge 0

  • α (alpha): The shape parameter. It controls the shape of the distribution.
  • β (beta): The rate parameter (sometimes a scale parameter θ = 1/β is used). It controls the spread of the distribution.

Relationship to Other Distributions

  • Exponential Distribution: When α = 1, the gamma distribution becomes the exponential distribution with rate β.
    Gamma(1, β) = Exponential(β)
  • Chi-Squared (χ²) Distribution: When α = v/2 (where v is degrees of freedom) and β = 1/2, the gamma distribution becomes the chi-squared distribution with v degrees of freedom.
    Gamma(v/2, 1/2) = χ²(v)

Mean, Variance, and MGF

For X ~ Gamma(α, β):

  • Mean (Expected Value):

    LATEX
        E[X] = \mu = \frac{\alpha}{\beta}
        

  • Variance:

    LATEX
        Var(X) = \sigma^2 = \frac{\alpha}{\beta^2}
        

  • Moment Generating Function (MGF): (without proof)

    LATEX
        M_X(t) = \left( \frac{\beta}{\beta - t} \right)^\alpha \quad \text{for } t < \beta
        

Applications

  • Modeling the waiting time for the α-th event in a Poisson process with rate β.
  • Used in queuing theory and reliability analysis.
  • Modeling the size of insurance claims or loan defaults.

3. Normal Distribution

The normal distribution, also known as the Gaussian distribution or "bell curve," is the most important probability distribution in statistics due to its prevalence in nature and its central role in statistical theory (e.g., the Central Limit Theorem).

Definition and PDF

A continuous random variable X follows a normal distribution with mean μ and variance σ² if its PDF is:

LATEX
f(x; \mu, \sigma^2) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} \quad \text{for } -\infty < x < \infty

  • μ (mu): The mean, which is also the center of the distribution (median and mode).
  • σ (sigma): The standard deviation, which controls the spread or "width" of the curve.

Key Properties

  • Symmetry: The curve is symmetric about the mean μ.
  • Bell-Shaped: The curve has a single peak at x = μ.
  • Asymptotic: The curve approaches the horizontal axis as x approaches ±∞ but never touches it.
  • Total Area: The total area under the curve is equal to 1.
  • Empirical Rule (68-95-99.7 Rule):
    • Approximately 68% of the data falls within 1 standard deviation of the mean (μ ± σ).
    • Approximately 95% of the data falls within 2 standard deviations of the mean (μ ± 2σ).
    • Approximately 99.7% of the data falls within 3 standard deviations of the mean (μ ± 3σ).

The Standard Normal Distribution (Z-Distribution)

This is a special case of the normal distribution where the mean is 0 and the standard deviation is 1.
Z ~ N(μ=0, σ²=1)

The PDF simplifies to:

LATEX
\phi(z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}}

Probabilities for the standard normal distribution are found using a Z-table, which typically provides P(Z ≤ z).

Standardization (Calculating Z-Scores)

Any normal random variable X ~ N(μ, σ²) can be transformed into a standard normal variable Z using the following formula:

LATEX
Z = \frac{X - \mu}{\sigma}

  • Purpose: This allows us to use a single Z-table to find probabilities for any normal distribution.
  • Interpretation: A Z-score measures how many standard deviations an observation X is away from the mean μ.

Mean, Variance, and MGF

For X ~ N(μ, σ²):

  • Mean: E[X] = μ
  • Variance: Var(X) = σ²
  • Moment Generating Function (MGF): (without proof)
    LATEX
        M_X(t) = e^{\mu t + \frac{1}{2}\sigma^2 t^2}
        

4. Normal Approximation to the Binomial

For a large number of trials n, calculating binomial probabilities P(X=k) can be computationally intensive. The normal distribution provides an excellent approximation under certain conditions.

When to Use the Approximation

A Binomial distribution B(n, p) can be approximated by a Normal distribution N(μ, σ²) if the following conditions are met:

  • np ≥ 5 (or sometimes np ≥ 10)
  • n(1-p) ≥ 5 (or sometimes n(1-p) ≥ 10)

These conditions ensure that the binomial distribution is reasonably symmetric and not too skewed.

The Approximation Method

  1. Check Conditions: Verify that np and n(1-p) are both sufficiently large.
  2. Find Mean and Variance: Calculate the mean and variance of the binomial distribution.
    • Mean: μ = np
    • Variance: σ² = np(1-p)
  3. Define Normal Variable: Use these parameters to define the approximating normal random variable Y ~ N(μ = np, σ² = np(1-p)).
  4. Apply Continuity Correction: Adjust the discrete binomial value to a continuous range.
  5. Standardize and Find Probability: Calculate the Z-score(s) for the adjusted range and use the Z-table to find the probability.

Continuity Correction

Since we are approximating a discrete distribution (Binomial) with a continuous one (Normal), we need to account for the "gaps" between integer values. We adjust the discrete value k by ±0.5.

Binomial Probability Equivalent Continuous Range
P(X = k) P(k - 0.5 < Y < k + 0.5)
P(X ≤ k) P(Y ≤ k + 0.5)
P(X < k) P(Y ≤ k - 0.5)
P(X ≥ k) P(Y ≥ k - 0.5)
P(X > k) P(Y ≥ k + 0.5)

Step-by-Step Example

Problem: A fair coin is tossed 100 times. What is the probability of getting between 45 and 55 heads, inclusive?

  1. Binomial Setup: X ~ B(n=100, p=0.5). We want P(45 ≤ X ≤ 55).
  2. Check Conditions:
    • np = 100 * 0.5 = 50 (which is ≥ 5)
    • n(1-p) = 100 * 0.5 = 50 (which is ≥ 5)
      The approximation is valid.
  3. Find Mean and Variance:
    • μ = np = 50
    • σ² = np(1-p) = 100 * 0.5 * 0.5 = 25
    • σ = √25 = 5
      So, Y ~ N(μ=50, σ²=25).
  4. Apply Continuity Correction:
    • P(45 ≤ X ≤ 55) becomes P(45 - 0.5 < Y < 55 + 0.5) which is P(44.5 < Y < 55.5).
  5. Standardize and Calculate:
    • Z₁ = (44.5 - 50) / 5 = -1.1
    • Z₂ = (55.5 - 50) / 5 = 1.1
    • We need P(-1.1 < Z < 1.1) = P(Z < 1.1) - P(Z < -1.1).
    • Using a Z-table: P(Z < 1.1) = 0.8643 and P(Z < -1.1) = 0.1357.
    • Probability = 0.8643 - 0.1357 = 0.7286.

There is approximately a 72.86% chance of getting between 45 and 55 heads.


5. Central Limit Theorem (CLT)

The Central Limit Theorem is a fundamental theorem in statistics that describes the shape of the sampling distribution of the sample mean ().

Statement of the Theorem (without proof)

Let X₁, X₂, ..., Xₙ be a random sample of size n taken from any population with a finite mean μ and a finite variance σ².

As the sample size n becomes large, the sampling distribution of the sample mean approaches a normal distribution with:

  • Mean: μ_X̄ = μ (the same as the population mean)
  • Variance: σ²_X̄ = σ²/n

The standard deviation of the sampling distribution, σ_X̄ = σ/√n, is called the standard error of the mean.

In short:

LATEX
\text{As } n \to \infty, \quad \bar{X} \approx N\left(\mu, \frac{\sigma^2}{n}\right)

Key Implications

  1. Universality: The theorem holds regardless of the shape of the original population distribution (e.g., skewed, uniform, bimodal). This is its most powerful feature.
  2. Normality: The distribution of sample means (not the individual data points) will be approximately normal for large n.
  3. Foundation for Inference: The CLT allows us to make inferences about a population mean (like constructing confidence intervals or performing hypothesis tests) without knowing the population's distribution, provided we have a large enough sample.

Conditions for Application

  • The samples must be random and independent.
  • The sample size n must be "sufficiently large".
    • Rule of Thumb: If the parent population is not normal, n ≥ 30 is a widely accepted guideline.
    • If the parent population is already normal, the sampling distribution of is exactly normal for any sample size n, not just approximately.

Formula for the Sampling Distribution of the Mean

To find probabilities related to a sample mean , we standardize it using its own mean and standard error:

LATEX
Z = \frac{\bar{X} - \mu_{\bar{X}}}{\sigma_{\bar{X}}} = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}}