Unit 5 - Notes

QTT201 10 min read

Unit 5: Measures of Dispersion

1. Introduction to Measures of Dispersion

Measures of dispersion (also known as measures of variation or spread) are statistical values that describe the spread or variability of data points in a distribution. While measures of central tendency (mean, median, mode) locate the center of a distribution, they do not reveal how the data is spread out around that center.

Why are Measures of Dispersion Important in Business?

  • Consistency and Quality Control: In manufacturing, dispersion measures help monitor the consistency of a product. Low dispersion means high consistency.
  • Risk Assessment in Finance: In investments, standard deviation is a key measure of volatility and risk. A higher standard deviation for a stock's returns implies higher risk.
  • Comparing Datasets: They allow for the comparison of variability between two or more datasets, even if their means are similar. For example, comparing the consistency of sales performance between two different teams.

Measures of dispersion can be classified into two types:

  • Absolute Measures: Expressed in the same units as the original data (e.g., Rupees, kg, cm). Examples: Range, Quartile Deviation, Mean Deviation, Standard Deviation.
  • Relative Measures: Unit-free ratios or percentages used for comparing the variability of different datasets. Examples: Coefficient of Range, Coefficient of Variation, etc.

2. Range

The range is the simplest measure of dispersion. It is the difference between the highest (largest) and lowest (smallest) value in a dataset.

Formula

TEXT
Range (R) = L - S

Where:

  • L = Largest value in the dataset
  • S = Smallest value in the dataset

Coefficient of Range

This is a relative measure used for comparison.

TEXT
Coefficient of Range = (L - S) / (L + S)

Calculation

  • Ungrouped Data: Simply find the largest and smallest values and take their difference.
    • Example: For data [10, 15, 12, 25, 18], L=25, S=10. Range = 25 - 10 = 15.
  • Discrete Series: The range is the difference between the largest and smallest variable values (x), not the frequencies.
  • Continuous Series: The range is the difference between the upper limit of the highest class interval and the lower limit of the lowest class interval.

Merits and Demerits

Merits:

  • Simple to understand and easy to calculate.
  • Provides a quick, though rough, idea of the data's spread.

Demerits:

  • Affected by Outliers: It is based only on the two extreme values, so a single outlier can drastically alter the range.
  • Ignores the Distribution: It does not provide any information about the distribution of values between the two extremes.
  • Not suitable for further mathematical treatment.
  • Cannot be calculated for open-ended distributions.

3. Quartile Deviation (or Semi-Interquartile Range)

Quartile Deviation is a measure of dispersion based on the upper and lower quartiles. It measures the average range of the middle 50% of the data, making it less sensitive to outliers than the range.

First, we define Quartiles:

  • First Quartile (Q1): The value that separates the lowest 25% of the data from the rest.
  • Third Quartile (Q3): The value that separates the lowest 75% of the data from the rest (or the highest 25%).

Interquartile Range (IQR): The range of the middle 50% of the data.

TEXT
IQR = Q3 - Q1

Formula for Quartile Deviation (QD)

TEXT
Quartile Deviation (QD) = (Q3 - Q1) / 2

Coefficient of Quartile Deviation

This is a relative measure.

TEXT
Coefficient of QD = (Q3 - Q1) / (Q3 + Q1)

Calculation

For Ungrouped and Discrete Series:

  1. Arrange the data in ascending order.
  2. Calculate Q1 and Q3 positions:
    • Q1 = value of the ((N+1)/4)th item
    • Q3 = value of the (3 * (N+1)/4)th item
    • N is the total number of observations (or total frequency Σf).

For Continuous Series:

  1. Calculate cumulative frequencies (c.f.).
  2. Find the Q1 and Q3 class intervals:
    • Q1 Class is the class where N/4 falls in the c.f.
    • Q3 Class is the class where 3N/4 falls in the c.f.
  3. Apply the interpolation formula:
    TEXT
        Qk = L + [ ((k*N/4) - cf) / f ] * i
        

    Where:
    • k = 1 for Q1, 3 for Q3
    • L = Lower limit of the quartile class
    • N = Total frequency (Σf)
    • cf = Cumulative frequency of the class preceding the quartile class
    • f = Frequency of the quartile class
    • i = Class width of the quartile class

Merits and Demerits

Merits:

  • Not affected by extreme values (outliers).
  • Better than range as it considers the middle 50% of the data.
  • Can be calculated for open-ended distributions.

Demerits:

  • Ignores the first 25% and last 25% of the data.
  • Not based on all observations in the dataset.
  • Not amenable to further algebraic manipulation.

4. Mean Deviation (or Average Deviation)

Mean Deviation is the arithmetic mean of the absolute deviations of the observations from a measure of central tendency (mean, median, or mode). It gives a better measure of spread than range or QD because it is based on all observations.

The deviation is taken as an absolute value (ignoring the negative signs) because the sum of deviations from the arithmetic mean is always zero (Σ(x - x̄) = 0).

Formulas

1. For Ungrouped Data:

TEXT
Mean Deviation from Mean (MD_x̄) = Σ|x - x̄| / n
Mean Deviation from Median (MD_M) = Σ|x - M| / n

Where n is the number of observations.

2. For Discrete/Continuous Data:

TEXT
Mean Deviation from Mean (MD_x̄) = Σf|x - x̄| / N
Mean Deviation from Median (MD_M) = Σf|x - M| / N

Where:

  • N = Σf
  • x is the value (for discrete) or midpoint (for continuous) of the class.

Note: Mean deviation is minimized when calculated from the median.

Coefficient of Mean Deviation

TEXT
Coefficient of MD (from Mean) = MD_x̄ / x̄
Coefficient of MD (from Median) = MD_M / M

Merits and Demerits

Merits:

  • Simple to understand.
  • Based on all observations in the dataset.
  • Less affected by extreme values compared to standard deviation.

Demerits:

  • Ignores algebraic signs: The process of taking absolute values is mathematically unsound and makes it difficult for further algebraic treatment.
  • Can be complex to compute if the mean or median is a fraction.
  • It is less commonly used in practice compared to standard deviation.

5. Standard Deviation and Variance

Standard Deviation is the most important and widely used measure of dispersion. It measures the typical or average distance of each data point from the mean of the dataset.

Variance (σ² or s²): The average of the squared deviations from the arithmetic mean. It is expressed in squared units.
Standard Deviation (σ or s): The positive square root of the variance. It is expressed in the original units of the data, making it more interpretable than variance.

  • Population symbols: Variance σ², Standard Deviation σ, Mean μ
  • Sample symbols: Variance , Standard Deviation s, Mean

Formulas

Data Type Variance (σ²) Standard Deviation (σ)
Ungrouped Data σ² = Σ(x - μ)² / N σ = √[ Σ(x - μ)² / N ]
Discrete/Continuous σ² = Σf(x - μ)² / N σ = √[ Σf(x - μ)² / N ]

Calculation Methods

1. Direct Method (Using Actual Mean):

  • Calculate the actual mean ().
  • Calculate the deviations from the mean (d = x - x̄).
  • Square the deviations ().
  • Multiply by frequency if applicable (fd²).
  • Sum them up (Σd² or Σfd²) and divide by N.
  • Take the square root.

2. Shortcut Method (Using Assumed Mean):
This method is useful when the actual mean is a decimal, making calculations tedious.

TEXT
σ = √[ (Σfd²) / N - (Σfd / N)² ]

Where:

  • A = Assumed Mean
  • d = x - A (deviation from assumed mean)
  • N = Σf

3. Step-Deviation Method (for Continuous Series with Equal Class Intervals):
This simplifies calculations further.

TEXT
σ = √[ (Σfd'²) / N - (Σfd' / N)² ] * i

Where:

  • d' = (x - A) / i (step-deviation)
  • A = Assumed Mean (usually the midpoint of a central class)
  • i = Common class interval width

Key Properties of Standard Deviation

  1. Always Non-Negative: SD is always zero or positive. σ = 0 only if all observations are identical.
  2. Independent of Change of Origin: If a constant is added to or subtracted from all values in the dataset, the standard deviation remains unchanged.
  3. Dependent on Change of Scale: If all values in the dataset are multiplied or divided by a constant, the standard deviation is also multiplied or divided by the absolute value of that constant.
  4. Combined Standard Deviation: For two groups with sizes N1, N2, means x̄1, x̄2, and standard deviations σ1, σ2, the combined standard deviation (σ12) is:
    TEXT
        σ12 = √[ (N1σ1² + N2σ2² + N1d1² + N2d2²) / (N1 + N2) ]
        

    where d1 = x̄1 - x̄12 and d2 = x̄2 - x̄12, and x̄12 is the combined mean.

Merits and Demerits

Merits:

  • Rigorously Defined: It has a precise mathematical definition.
  • Based on All Observations: It uses every value in the dataset.
  • Amenable to Algebraic Treatment: It is the foundation for many advanced statistical techniques (e.g., correlation, regression, hypothesis testing).
  • Less affected by sampling fluctuations compared to other measures.

Demerits:

  • More complex to calculate and understand than other measures.
  • Gives more weight to extreme values: Squaring the deviations makes outliers have a disproportionately large effect on the final value.

6. Coefficient of Variation (CV)

The Coefficient of Variation is a relative measure of dispersion. It is defined as the ratio of the standard deviation to the mean, usually expressed as a percentage.

Its primary use is to compare the variability, consistency, or uniformity of two or more datasets, especially when their means are different or they are measured in different units.

Formula

TEXT
CV = (Standard Deviation / Mean) * 100
CV = (σ / x̄) * 100

Interpretation

  • Lower CV: Indicates greater consistency, uniformity, or stability in the data.
  • Higher CV: Indicates greater variability, less consistency, or more dispersion.

Example Application in Business:

  • Comparing Investments: An investment with a lower CV is considered less risky for the level of return it generates.
  • Comparing Employee Performance: A salesperson with a lower CV in their monthly sales figures is more consistent than one with a higher CV, even if their average sales are similar.
  • Quality Control: A manufacturing process with a lower CV produces more uniform products.

7. Skewness

Skewness is a measure of the asymmetry of a probability distribution about its mean. It describes the shape of the distribution.

  • Symmetrical Distribution (Zero Skewness): The distribution is perfectly balanced on both sides of the center. The right and left tails are mirror images. In this case, Mean = Median = Mode.
  • Positively Skewed Distribution (Right-Skewed): The tail on the right side of the distribution is longer or fatter than the left side. The bulk of the data is concentrated on the left. In this case, Mean > Median > Mode.
  • Negatively Skewed Distribution (Left-Skewed): The tail on the left side is longer or fatter than the right side. The bulk of the data is concentrated on the right. In this case, Mean < Median < Mode.

Measures of Skewness

These are coefficients that give a numerical value to the degree and direction of skewness.

1. Karl Pearson's Coefficient of Skewness (Skp)
This measure is based on the relationship between the mean, median, and mode.

  • Primary Formula (based on Mode):
    TEXT
        Skp = (Mean - Mode) / Standard Deviation
        
  • Alternative Formula (based on Median): Used when the mode is ill-defined. It relies on the empirical relationship: Mode ≈ 3 * Median - 2 * Mean.
    TEXT
        Skp = 3 * (Mean - Median) / Standard Deviation
        

Interpretation of Skp:

  • Skp = 0: Symmetrical distribution.
  • Skp > 0: Positively skewed distribution.
  • Skp < 0: Negatively skewed distribution.
  • Generally, if |Skp| > 1, the skewness is considered high. If 0.5 < |Skp| < 1, it's moderate.

2. Bowley's Coefficient of Skewness (Skb)
This measure is based on quartiles and is useful for open-ended distributions or when outliers are present.

TEXT
Skb = (Q3 + Q1 - 2 * Median) / (Q3 - Q1)

Interpretation of Skb:

  • The value always lies between -1 and +1.
  • Skb = 0: Symmetrical distribution (Median is equidistant from Q1 and Q3).
  • Skb > 0: Positively skewed distribution (Median is closer to Q1).
  • Skb < 0: Negatively skewed distribution (Median is closer to Q3).