Unit 1 - Notes

MTH302 9 min read

Unit 1: Random Variables and Probability Distributions

1. Random Variables

1.1 Definition

A random variable is a variable whose value is a numerical outcome of a random phenomenon. It is a function that maps the outcomes of a random experiment to a set of real numbers. We typically denote random variables with uppercase letters (e.g., X, Y, Z) and their specific values with lowercase letters (e.g., x, y, z).

Example: Consider the experiment of flipping a coin twice.

  • The sample space (set of all possible outcomes) is S = {HH, HT, TH, TT}.
  • We can define a random variable X as the "number of heads".
  • X maps the outcomes to numbers:
    • X(TT) = 0
    • X(HT) = 1
    • X(TH) = 1
    • X(HH) = 2
  • The possible values for the random variable X are {0, 1, 2}.

1.2 Types of Random Variables

1.2.1 Discrete Random Variable

A random variable is discrete if its set of possible values is either finite or countably infinite. It can take on specific, separated values.

Examples:

  • The number of heads in three coin flips (values: {0, 1, 2, 3}).
  • The outcome of rolling a standard six-sided die (values: {1, 2, 3, 4, 5, 6}).
  • The number of defective items in a sample of 20 items (values: {0, 1, ..., 20}).
  • The number of emails you receive in an hour (values: {0, 1, 2, ...}, countably infinite).

1.2.2 Continuous Random Variable

A random variable is continuous if it can take on any value within a given range or interval. The number of possible values is uncountably infinite.

Examples:

  • The height of a student (can be any value in a range, e.g., 1.5m, 1.51m, 1.511m...).
  • The temperature of a room.
  • The time it takes to complete a task.
  • The exact weight of a bag of sugar.

2. Probability Distributions

A probability distribution is a mathematical function that describes the probability of different possible values of a random variable. The form of this function depends on whether the variable is discrete or continuous.

2.1 Discrete Probability Distributions

For a discrete random variable X, the probability distribution is described by a Probability Mass Function (PMF), denoted as p(x) or P(X=x).

Probability Mass Function (PMF)
The PMF gives the probability that the discrete random variable X is exactly equal to some value x.
p(x) = P(X = x)

Properties of a PMF:

  1. Non-negativity: p(x) ≥ 0 for all possible values of x.
  2. Summation to One: The sum of probabilities for all possible values of x must be equal to 1.
    Σ p(x) = 1 (where the sum is over all possible values of X)

Example: Let X be the number of heads in two fair coin flips.

  • Possible values for X are {0, 1, 2}.
  • P(X=0) = P(TT) = 1/4
  • P(X=1) = P(HT or TH) = P(HT) + P(TH) = 1/4 + 1/4 = 1/2
  • P(X=2) = P(HH) = 1/4

The PMF can be represented in a table:

x p(x) = P(X=x)
0 1/4
1 1/2
2 1/4

Check properties:

  1. All p(x) are ≥ 0. (True)
  2. Σ p(x) = 1/4 + 1/2 + 1/4 = 1. (True)

2.2 Continuous Probability Distributions

For a continuous random variable X, the probability distribution is described by a Probability Density Function (PDF), denoted as f(x).

Probability Density Function (PDF)
The PDF does not give the probability that X is equal to x. Instead, the area under the PDF curve between two points a and b gives the probability that X falls within that interval.
P(a ≤ X ≤ b) = ∫[a,b] f(x) dx

Key Point: For any continuous random variable, the probability of it taking on any single specific value is zero.
P(X = c) = ∫[c,c] f(x) dx = 0

Properties of a PDF:

  1. Non-negativity: f(x) ≥ 0 for all x.
  2. Total Area is One: The total area under the curve over its entire range must be equal to 1.
    ∫[-∞,∞] f(x) dx = 1

Example: Let X be a random variable with the following PDF (a uniform distribution):
f(x) = 0.5 for 0 ≤ x ≤ 2
f(x) = 0 otherwise

Check properties:

  1. f(x) is either 0.5 or 0, so it's always ≥ 0. (True)
  2. ∫[-∞,∞] f(x) dx = ∫[0,2] 0.5 dx = [0.5x] from 0 to 2 = (0.5 * 2) - (0.5 * 0) = 1 - 0 = 1. (True)

Calculating a probability:
What is the probability that X is between 0.5 and 1.5?
P(0.5 ≤ X ≤ 1.5) = ∫[0.5, 1.5] 0.5 dx = [0.5x] from 0.5 to 1.5 = (0.5 * 1.5) - (0.5 * 0.5) = 0.75 - 0.25 = 0.5


3. Cumulative Distribution Function (CDF)

The Cumulative Distribution Function (CDF), denoted F(x), gives the cumulative probability that the random variable X is less than or equal to a particular value x. It is defined for both discrete and continuous random variables.

F(x) = P(X ≤ x)

3.1 CDF for a Discrete Random Variable

The CDF is found by summing the probabilities of the PMF for all values less than or equal to x.
F(x) = P(X ≤ x) = Σ[t≤x] p(t)

Example (Two coin flips):

  • F(-1) = P(X ≤ -1) = 0
  • F(0) = P(X ≤ 0) = p(0) = 1/4
  • F(0.5) = P(X ≤ 0.5) = p(0) = 1/4
  • F(1) = P(X ≤ 1) = p(0) + p(1) = 1/4 + 1/2 = 3/4
  • F(1.9) = P(X ≤ 1.9) = p(0) + p(1) = 3/4
  • F(2) = P(X ≤ 2) = p(0) + p(1) + p(2) = 1/4 + 1/2 + 1/4 = 1
  • F(3) = P(X ≤ 3) = 1

The CDF for a discrete variable is a step function.

3.2 CDF for a Continuous Random Variable

The CDF is found by integrating the PDF from negative infinity up to x.
F(x) = P(X ≤ x) = ∫[-∞,x] f(t) dt

Example (Uniform distribution from earlier):
f(t) = 0.5 for 0 ≤ t ≤ 2, and 0 otherwise.

  • For x < 0: F(x) = ∫[-∞,x] 0 dt = 0
  • For 0 ≤ x ≤ 2: F(x) = ∫[-∞,x] f(t) dt = ∫[-∞,0] 0 dt + ∫[0,x] 0.5 dt = 0 + [0.5t] from 0 to x = 0.5x
  • For x > 2: F(x) = ∫[-∞,x] f(t) dt = ∫[0,2] 0.5 dt + ∫[2,x] 0 dt = 1 + 0 = 1

So the CDF is:
F(x) = 0 if x < 0
F(x) = 0.5x if 0 ≤ x ≤ 2
F(x) = 1 if x > 2

3.3 Properties of a CDF

  1. 0 ≤ F(x) ≤ 1
  2. F(x) is a non-decreasing function (i.e., if a < b, then F(a) ≤ F(b)).
  3. lim (x→-∞) F(x) = 0 and lim (x→+∞) F(x) = 1.
  4. Relationship between distributions:
    • For continuous variables: f(x) = d/dx F(x). The PDF is the derivative of the CDF.
  5. Calculating Interval Probabilities: P(a < X ≤ b) = F(b) - F(a)

4. Moments of a Distribution

Moments are a set of quantitative measures that describe the shape of a probability distribution.

4.1 Moments about Origin (Raw Moments)

The r-th raw moment of a random variable X, denoted μ'_r, is the expected value of X^r.

μ'_r = E[X^r]

  • For a discrete random variable:
    TEXT
        μ'_r = E[X^r] = Σ [x^r * p(x)]
        
  • For a continuous random variable:
    TEXT
        μ'_r = E[X^r] = ∫[-∞,∞] [x^r * f(x)] dx
        

The first four raw moments:

  • First raw moment (r=1): μ'_1 = E[X]. This is the Mean of the distribution.
  • Second raw moment (r=2): μ'_2 = E[X^2]. Used to calculate the variance.
  • Third raw moment (r=3): μ'_3 = E[X^3]. Used to calculate skewness.
  • Fourth raw moment (r=4): μ'_4 = E[X^4]. Used to calculate kurtosis.

4.2 Moments about Mean (Central Moments)

The r-th central moment of a random variable X, denoted μ_r, is the expected value of (X - μ)^r, where μ = E[X].

μ_r = E[(X - μ)^r]

  • For a discrete random variable:
    TEXT
        μ_r = E[(X - μ)^r] = Σ [(x - μ)^r * p(x)]
        
  • For a continuous random variable:
    TEXT
        μ_r = E[(X - μ)^r] = ∫[-∞,∞] [(x - μ)^r * f(x)] dx
        

The first four central moments:

  • First central moment (r=1): μ_1 = E[X - μ] = E[X] - μ = μ - μ = 0. The first central moment is always zero.
  • Second central moment (r=2): μ_2 = E[(X - μ)^2]. This is the Variance of the distribution, denoted σ².
  • Third central moment (r=3): μ_3 = E[(X - μ)^3]. This is a measure of the asymmetry (skewness) of the distribution.
  • Fourth central moment (r=4): μ_4 = E[(X - μ)^4]. This is a measure of the "tailedness" (kurtosis) of the distribution.

4.3 Relationship between Raw and Central Moments

It is often easier to calculate raw moments first and then use them to find the central moments.

  • μ_1 = 0
  • μ_2 = E[(X-μ)^2] = E[X^2 - 2Xμ + μ^2] = E[X^2] - 2μE[X] + μ^2 = μ'_2 - 2μ(μ) + μ^2 = μ'_2 - μ^2
    μ_2 = μ'_2 - (μ'_1)^2
  • μ_3 = μ'_3 - 3μ'_2μ'_1 + 2(μ'_1)^3
  • μ_4 = μ'_4 - 4μ'_3μ'_1 + 6μ'_2(μ'_1)^2 - 3(μ'_1)^4

5. Descriptive Measures of a Distribution

These are key values, derived from moments, that summarize the central tendency, dispersion, and shape of a distribution.

5.1 Mean (Measure of Central Tendency)

The mean or expected value E[X] is the average value of the random variable, weighted by its probability. It is the first raw moment.

  • Symbol: μ or E[X]
  • Formula: μ = μ'_1
  • Interpretation: The long-run average of the experiment; the "center of mass" of the distribution.

5.2 Variance and Standard Deviation (Measure of Dispersion)

Variance measures the spread or dispersion of the data points around the mean. It is the second central moment.

  • Symbol: σ² or Var(X)
  • Formula: σ² = μ_2 = E[(X - μ)²]
  • Computational Formula: σ² = E[X²] - (E[X])² = μ'_2 - (μ'_1)²
  • Interpretation: The average of the squared deviations from the mean. A larger variance means the data is more spread out.

Standard Deviation is the square root of the variance.

  • Symbol: σ or SD(X)
  • Formula: σ = √Var(X)
  • Interpretation: A measure of spread in the original units of the random variable.

5.3 Skewness (Measure of Asymmetry)

Skewness measures the asymmetry of the probability distribution about its mean. It depends on the third central moment.

  • Pearson's Moment Coefficient of Skewness:
    TEXT
        γ_1 = μ_3 / σ³ = μ_3 / (μ_2)^(3/2)
        
  • Interpretation:
    • γ_1 > 0 (Positive Skew): The distribution has a longer tail on the right side. The mass of the distribution is concentrated on the left. Mean > Median.
    • γ_1 < 0 (Negative Skew): The distribution has a longer tail on the left side. The mass of the distribution is concentrated on the right. Mean < Median.
    • γ_1 = 0 (Symmetric): The distribution is perfectly symmetric around the mean (e.g., Normal distribution). Mean = Median.

5.4 Kurtosis (Measure of "Tailedness")

Kurtosis measures the "tailedness" of a distribution—how much of the distribution's mass is in the tails versus the center. It describes the sharpness of the peak and the weight of the tails. It depends on the fourth central moment.

  • Coefficient of Kurtosis:
    TEXT
        β_2 = μ_4 / (μ_2)² = μ_4 / σ⁴
        
  • The kurtosis of a normal distribution is β_2 = 3. Therefore, it's common to use Excess Kurtosis.
  • Excess Kurtosis:
    TEXT
        γ_2 = β_2 - 3
        
  • Interpretation (using excess kurtosis γ_2):
    • γ_2 > 0 (Leptokurtic): "Leap-to" kurtosis. The distribution is more peaked and has heavier/fatter tails than a normal distribution. This means there are more outliers.
    • γ_2 < 0 (Platykurtic): "Platypus-like" or flat kurtosis. The distribution is less peaked (flatter) and has lighter/thinner tails than a normal distribution. Outliers are less likely.
    • γ_2 = 0 (Mesokurtic): The distribution has the same kurtosis as a normal distribution.