Unit 4 - Notes
Unit 4: Special Continuous Distributions
1. Exponential Distribution
The exponential distribution is used to model the time until a certain event occurs, such as the time until a part fails or the time between arrivals at a service counter. It is closely related to the Poisson process.
Definition and PDF/CDF
A continuous random variable X follows an exponential distribution with a rate parameter λ > 0 if its probability density function (PDF) is given by:
f(x; \lambda) =
\begin{cases}
\lambda e^{-\lambda x} & \text{for } x \ge 0 \\
0 & \text{for } x < 0
\end{cases}
- Parameter
λ(lambda): The rate parameter, representing the average number of events per unit of time. - Parameter
β = 1/λ: An alternative parameterization whereβis the scale parameter, representing the mean waiting time. The PDF can be written asf(x; β) = (1/β) * e^(-x/β).
The cumulative distribution function (CDF) gives the probability that the event has occurred by time x:
F(x; \lambda) = P(X \le x) =
\begin{cases}
1 - e^{-\lambda x} & \text{for } x \ge 0 \\
0 & \text{for } x < 0
\end{cases}
The probability that the waiting time is greater than x is called the survival function:
P(X > x) = 1 - F(x) = e^{-\lambda x}
Key Properties
Memoryless Property: This is the most crucial property of the exponential distribution. It states that the probability of an event occurring in a future interval is independent of how much time has already passed. Mathematically:
P(X > s + t | X > s) = P(X > t) \quad \text{for all } s, t \ge 0
- Intuition: If a lightbulb has an exponential lifetime distribution and has already worked for 100 hours, the probability that it will work for another 50 hours is the same as the probability that a brand new bulb will work for 50 hours. The past has no bearing on its future lifetime.
Mean, Variance, and MGF
For a random variable X following an exponential distribution with rate parameter λ:
-
Mean (Expected Value):
LATEXE[X] = \mu = \frac{1}{\lambda}
The average waiting time is the reciprocal of the rate. If events occur at a rate of 2 per hour (λ=2), the average waiting time between events is 1/2 hour. -
Variance:
LATEXVar(X) = \sigma^2 = \frac{1}{\lambda^2} -
Standard Deviation:
LATEXSD(X) = \sigma = \frac{1}{\lambda}
Note that for the exponential distribution, the mean and the standard deviation are equal. -
Moment Generating Function (MGF): (without proof)
LATEXM_X(t) = E[e^{tX}] = \frac{\lambda}{\lambda - t} \quad \text{for } t < \lambda
Applications
- Modeling the time until the next customer arrives in a queue.
- Modeling the lifetime of electronic components that do not "age".
- Modeling the time between radioactive decay events.
2. Gamma Distribution
The gamma distribution is a flexible two-parameter family of continuous probability distributions. It is a generalization of the exponential distribution and can be used to model the waiting time until a specified number of events occur.
The Gamma Function
The gamma distribution is defined using the gamma function, Γ(α), which is a generalization of the factorial function to non-integer values.
\Gamma(\alpha) = \int_{0}^{\infty} x^{\alpha-1}e^{-x} \,dx \quad \text{for } \alpha > 0
- Property:
Γ(α) = (α-1)Γ(α-1) - For integer n:
Γ(n) = (n-1)! - Special value:
Γ(1/2) = √π
Definition and PDF
A continuous random variable X follows a gamma distribution with a shape parameter α > 0 and a rate parameter β > 0 if its PDF is:
f(x; \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x} \quad \text{for } x \ge 0
α(alpha): The shape parameter. It controls the shape of the distribution.β(beta): The rate parameter (sometimes a scale parameterθ = 1/βis used). It controls the spread of the distribution.
Relationship to Other Distributions
- Exponential Distribution: When
α = 1, the gamma distribution becomes the exponential distribution with rateβ.
Gamma(1, β) = Exponential(β) - Chi-Squared (
χ²) Distribution: Whenα = v/2(wherevis degrees of freedom) andβ = 1/2, the gamma distribution becomes the chi-squared distribution withvdegrees of freedom.
Gamma(v/2, 1/2) = χ²(v)
Mean, Variance, and MGF
For X ~ Gamma(α, β):
-
Mean (Expected Value):
LATEXE[X] = \mu = \frac{\alpha}{\beta} -
Variance:
LATEXVar(X) = \sigma^2 = \frac{\alpha}{\beta^2} -
Moment Generating Function (MGF): (without proof)
LATEXM_X(t) = \left( \frac{\beta}{\beta - t} \right)^\alpha \quad \text{for } t < \beta
Applications
- Modeling the waiting time for the
α-th event in a Poisson process with rateβ. - Used in queuing theory and reliability analysis.
- Modeling the size of insurance claims or loan defaults.
3. Normal Distribution
The normal distribution, also known as the Gaussian distribution or "bell curve," is the most important probability distribution in statistics due to its prevalence in nature and its central role in statistical theory (e.g., the Central Limit Theorem).
Definition and PDF
A continuous random variable X follows a normal distribution with mean μ and variance σ² if its PDF is:
f(x; \mu, \sigma^2) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} \quad \text{for } -\infty < x < \infty
μ(mu): The mean, which is also the center of the distribution (median and mode).σ(sigma): The standard deviation, which controls the spread or "width" of the curve.
Key Properties
- Symmetry: The curve is symmetric about the mean
μ. - Bell-Shaped: The curve has a single peak at
x = μ. - Asymptotic: The curve approaches the horizontal axis as
xapproaches±∞but never touches it. - Total Area: The total area under the curve is equal to 1.
- Empirical Rule (68-95-99.7 Rule):
- Approximately 68% of the data falls within 1 standard deviation of the mean (
μ ± σ). - Approximately 95% of the data falls within 2 standard deviations of the mean (
μ ± 2σ). - Approximately 99.7% of the data falls within 3 standard deviations of the mean (
μ ± 3σ).
- Approximately 68% of the data falls within 1 standard deviation of the mean (
The Standard Normal Distribution (Z-Distribution)
This is a special case of the normal distribution where the mean is 0 and the standard deviation is 1.
Z ~ N(μ=0, σ²=1)
The PDF simplifies to:
\phi(z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}}
Probabilities for the standard normal distribution are found using a Z-table, which typically provides P(Z ≤ z).
Standardization (Calculating Z-Scores)
Any normal random variable X ~ N(μ, σ²) can be transformed into a standard normal variable Z using the following formula:
Z = \frac{X - \mu}{\sigma}
- Purpose: This allows us to use a single Z-table to find probabilities for any normal distribution.
- Interpretation: A Z-score measures how many standard deviations an observation
Xis away from the meanμ.
Mean, Variance, and MGF
For X ~ N(μ, σ²):
- Mean:
E[X] = μ - Variance:
Var(X) = σ² - Moment Generating Function (MGF): (without proof)
LATEXM_X(t) = e^{\mu t + \frac{1}{2}\sigma^2 t^2}
4. Normal Approximation to the Binomial
For a large number of trials n, calculating binomial probabilities P(X=k) can be computationally intensive. The normal distribution provides an excellent approximation under certain conditions.
When to Use the Approximation
A Binomial distribution B(n, p) can be approximated by a Normal distribution N(μ, σ²) if the following conditions are met:
np ≥ 5(or sometimesnp ≥ 10)n(1-p) ≥ 5(or sometimesn(1-p) ≥ 10)
These conditions ensure that the binomial distribution is reasonably symmetric and not too skewed.
The Approximation Method
- Check Conditions: Verify that
npandn(1-p)are both sufficiently large. - Find Mean and Variance: Calculate the mean and variance of the binomial distribution.
- Mean:
μ = np - Variance:
σ² = np(1-p)
- Mean:
- Define Normal Variable: Use these parameters to define the approximating normal random variable
Y ~ N(μ = np, σ² = np(1-p)). - Apply Continuity Correction: Adjust the discrete binomial value to a continuous range.
- Standardize and Find Probability: Calculate the Z-score(s) for the adjusted range and use the Z-table to find the probability.
Continuity Correction
Since we are approximating a discrete distribution (Binomial) with a continuous one (Normal), we need to account for the "gaps" between integer values. We adjust the discrete value k by ±0.5.
| Binomial Probability | Equivalent Continuous Range |
|---|---|
P(X = k) |
P(k - 0.5 < Y < k + 0.5) |
P(X ≤ k) |
P(Y ≤ k + 0.5) |
P(X < k) |
P(Y ≤ k - 0.5) |
P(X ≥ k) |
P(Y ≥ k - 0.5) |
P(X > k) |
P(Y ≥ k + 0.5) |
Step-by-Step Example
Problem: A fair coin is tossed 100 times. What is the probability of getting between 45 and 55 heads, inclusive?
- Binomial Setup:
X ~ B(n=100, p=0.5). We wantP(45 ≤ X ≤ 55). - Check Conditions:
np = 100 * 0.5 = 50(which is ≥ 5)n(1-p) = 100 * 0.5 = 50(which is ≥ 5)
The approximation is valid.
- Find Mean and Variance:
μ = np = 50σ² = np(1-p) = 100 * 0.5 * 0.5 = 25σ = √25 = 5
So,Y ~ N(μ=50, σ²=25).
- Apply Continuity Correction:
P(45 ≤ X ≤ 55)becomesP(45 - 0.5 < Y < 55 + 0.5)which isP(44.5 < Y < 55.5).
- Standardize and Calculate:
Z₁ = (44.5 - 50) / 5 = -1.1Z₂ = (55.5 - 50) / 5 = 1.1- We need
P(-1.1 < Z < 1.1) = P(Z < 1.1) - P(Z < -1.1). - Using a Z-table:
P(Z < 1.1) = 0.8643andP(Z < -1.1) = 0.1357. Probability = 0.8643 - 0.1357 = 0.7286.
There is approximately a 72.86% chance of getting between 45 and 55 heads.
5. Central Limit Theorem (CLT)
The Central Limit Theorem is a fundamental theorem in statistics that describes the shape of the sampling distribution of the sample mean (X̄).
Statement of the Theorem (without proof)
Let X₁, X₂, ..., Xₙ be a random sample of size n taken from any population with a finite mean μ and a finite variance σ².
As the sample size n becomes large, the sampling distribution of the sample mean X̄ approaches a normal distribution with:
- Mean:
μ_X̄ = μ(the same as the population mean) - Variance:
σ²_X̄ = σ²/n
The standard deviation of the sampling distribution, σ_X̄ = σ/√n, is called the standard error of the mean.
In short:
\text{As } n \to \infty, \quad \bar{X} \approx N\left(\mu, \frac{\sigma^2}{n}\right)
Key Implications
- Universality: The theorem holds regardless of the shape of the original population distribution (e.g., skewed, uniform, bimodal). This is its most powerful feature.
- Normality: The distribution of sample means (not the individual data points) will be approximately normal for large
n. - Foundation for Inference: The CLT allows us to make inferences about a population mean (like constructing confidence intervals or performing hypothesis tests) without knowing the population's distribution, provided we have a large enough sample.
Conditions for Application
- The samples must be random and independent.
- The sample size
nmust be "sufficiently large".- Rule of Thumb: If the parent population is not normal,
n ≥ 30is a widely accepted guideline. - If the parent population is already normal, the sampling distribution of
X̄is exactly normal for any sample sizen, not just approximately.
- Rule of Thumb: If the parent population is not normal,
Formula for the Sampling Distribution of the Mean
To find probabilities related to a sample mean X̄, we standardize it using its own mean and standard error:
Z = \frac{\bar{X} - \mu_{\bar{X}}}{\sigma_{\bar{X}}} = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}}