Unit 5 - Notes
Unit 5: Measures of Dispersion
1. Introduction to Measures of Dispersion
Measures of dispersion (also known as measures of variation or spread) are statistical values that describe the spread or variability of data points in a distribution. While measures of central tendency (mean, median, mode) locate the center of a distribution, they do not reveal how the data is spread out around that center.
Why are Measures of Dispersion Important in Business?
- Consistency and Quality Control: In manufacturing, dispersion measures help monitor the consistency of a product. Low dispersion means high consistency.
- Risk Assessment in Finance: In investments, standard deviation is a key measure of volatility and risk. A higher standard deviation for a stock's returns implies higher risk.
- Comparing Datasets: They allow for the comparison of variability between two or more datasets, even if their means are similar. For example, comparing the consistency of sales performance between two different teams.
Measures of dispersion can be classified into two types:
- Absolute Measures: Expressed in the same units as the original data (e.g., Rupees, kg, cm). Examples: Range, Quartile Deviation, Mean Deviation, Standard Deviation.
- Relative Measures: Unit-free ratios or percentages used for comparing the variability of different datasets. Examples: Coefficient of Range, Coefficient of Variation, etc.
2. Range
The range is the simplest measure of dispersion. It is the difference between the highest (largest) and lowest (smallest) value in a dataset.
Formula
Range (R) = L - S
Where:
L= Largest value in the datasetS= Smallest value in the dataset
Coefficient of Range
This is a relative measure used for comparison.
Coefficient of Range = (L - S) / (L + S)
Calculation
- Ungrouped Data: Simply find the largest and smallest values and take their difference.
- Example: For data
[10, 15, 12, 25, 18],L=25,S=10.Range = 25 - 10 = 15.
- Example: For data
- Discrete Series: The range is the difference between the largest and smallest variable values (
x), not the frequencies. - Continuous Series: The range is the difference between the upper limit of the highest class interval and the lower limit of the lowest class interval.
Merits and Demerits
Merits:
- Simple to understand and easy to calculate.
- Provides a quick, though rough, idea of the data's spread.
Demerits:
- Affected by Outliers: It is based only on the two extreme values, so a single outlier can drastically alter the range.
- Ignores the Distribution: It does not provide any information about the distribution of values between the two extremes.
- Not suitable for further mathematical treatment.
- Cannot be calculated for open-ended distributions.
3. Quartile Deviation (or Semi-Interquartile Range)
Quartile Deviation is a measure of dispersion based on the upper and lower quartiles. It measures the average range of the middle 50% of the data, making it less sensitive to outliers than the range.
First, we define Quartiles:
- First Quartile (Q1): The value that separates the lowest 25% of the data from the rest.
- Third Quartile (Q3): The value that separates the lowest 75% of the data from the rest (or the highest 25%).
Interquartile Range (IQR): The range of the middle 50% of the data.
IQR = Q3 - Q1
Formula for Quartile Deviation (QD)
Quartile Deviation (QD) = (Q3 - Q1) / 2
Coefficient of Quartile Deviation
This is a relative measure.
Coefficient of QD = (Q3 - Q1) / (Q3 + Q1)
Calculation
For Ungrouped and Discrete Series:
- Arrange the data in ascending order.
- Calculate Q1 and Q3 positions:
Q1 = value of the ((N+1)/4)th itemQ3 = value of the (3 * (N+1)/4)th itemNis the total number of observations (or total frequencyΣf).
For Continuous Series:
- Calculate cumulative frequencies (c.f.).
- Find the Q1 and Q3 class intervals:
- Q1 Class is the class where
N/4falls in the c.f. - Q3 Class is the class where
3N/4falls in the c.f.
- Q1 Class is the class where
- Apply the interpolation formula:
TEXTQk = L + [ ((k*N/4) - cf) / f ] * i
Where:k= 1 for Q1, 3 for Q3L= Lower limit of the quartile classN= Total frequency (Σf)cf= Cumulative frequency of the class preceding the quartile classf= Frequency of the quartile classi= Class width of the quartile class
Merits and Demerits
Merits:
- Not affected by extreme values (outliers).
- Better than range as it considers the middle 50% of the data.
- Can be calculated for open-ended distributions.
Demerits:
- Ignores the first 25% and last 25% of the data.
- Not based on all observations in the dataset.
- Not amenable to further algebraic manipulation.
4. Mean Deviation (or Average Deviation)
Mean Deviation is the arithmetic mean of the absolute deviations of the observations from a measure of central tendency (mean, median, or mode). It gives a better measure of spread than range or QD because it is based on all observations.
The deviation is taken as an absolute value (ignoring the negative signs) because the sum of deviations from the arithmetic mean is always zero (Σ(x - x̄) = 0).
Formulas
1. For Ungrouped Data:
Mean Deviation from Mean (MD_x̄) = Σ|x - x̄| / n
Mean Deviation from Median (MD_M) = Σ|x - M| / n
Where
n is the number of observations.
2. For Discrete/Continuous Data:
Mean Deviation from Mean (MD_x̄) = Σf|x - x̄| / N
Mean Deviation from Median (MD_M) = Σf|x - M| / N
Where:
N = Σfxis the value (for discrete) or midpoint (for continuous) of the class.
Note: Mean deviation is minimized when calculated from the median.
Coefficient of Mean Deviation
Coefficient of MD (from Mean) = MD_x̄ / x̄
Coefficient of MD (from Median) = MD_M / M
Merits and Demerits
Merits:
- Simple to understand.
- Based on all observations in the dataset.
- Less affected by extreme values compared to standard deviation.
Demerits:
- Ignores algebraic signs: The process of taking absolute values is mathematically unsound and makes it difficult for further algebraic treatment.
- Can be complex to compute if the mean or median is a fraction.
- It is less commonly used in practice compared to standard deviation.
5. Standard Deviation and Variance
Standard Deviation is the most important and widely used measure of dispersion. It measures the typical or average distance of each data point from the mean of the dataset.
Variance (σ² or s²): The average of the squared deviations from the arithmetic mean. It is expressed in squared units.
Standard Deviation (σ or s): The positive square root of the variance. It is expressed in the original units of the data, making it more interpretable than variance.
- Population symbols: Variance
σ², Standard Deviationσ, Meanμ - Sample symbols: Variance
s², Standard Deviations, Meanx̄
Formulas
| Data Type | Variance (σ²) | Standard Deviation (σ) |
|---|---|---|
| Ungrouped Data | σ² = Σ(x - μ)² / N |
σ = √[ Σ(x - μ)² / N ] |
| Discrete/Continuous | σ² = Σf(x - μ)² / N |
σ = √[ Σf(x - μ)² / N ] |
Calculation Methods
1. Direct Method (Using Actual Mean):
- Calculate the actual mean (
x̄). - Calculate the deviations from the mean (
d = x - x̄). - Square the deviations (
d²). - Multiply by frequency if applicable (
fd²). - Sum them up (
Σd²orΣfd²) and divide byN. - Take the square root.
2. Shortcut Method (Using Assumed Mean):
This method is useful when the actual mean is a decimal, making calculations tedious.
σ = √[ (Σfd²) / N - (Σfd / N)² ]
Where:
A= Assumed Meand = x - A(deviation from assumed mean)N = Σf
3. Step-Deviation Method (for Continuous Series with Equal Class Intervals):
This simplifies calculations further.
σ = √[ (Σfd'²) / N - (Σfd' / N)² ] * i
Where:
d' = (x - A) / i(step-deviation)A= Assumed Mean (usually the midpoint of a central class)i= Common class interval width
Key Properties of Standard Deviation
- Always Non-Negative: SD is always zero or positive.
σ = 0only if all observations are identical. - Independent of Change of Origin: If a constant is added to or subtracted from all values in the dataset, the standard deviation remains unchanged.
- Dependent on Change of Scale: If all values in the dataset are multiplied or divided by a constant, the standard deviation is also multiplied or divided by the absolute value of that constant.
- Combined Standard Deviation: For two groups with sizes
N1,N2, meansx̄1,x̄2, and standard deviationsσ1,σ2, the combined standard deviation (σ12) is:
TEXTσ12 = √[ (N1σ1² + N2σ2² + N1d1² + N2d2²) / (N1 + N2) ]
whered1 = x̄1 - x̄12andd2 = x̄2 - x̄12, andx̄12is the combined mean.
Merits and Demerits
Merits:
- Rigorously Defined: It has a precise mathematical definition.
- Based on All Observations: It uses every value in the dataset.
- Amenable to Algebraic Treatment: It is the foundation for many advanced statistical techniques (e.g., correlation, regression, hypothesis testing).
- Less affected by sampling fluctuations compared to other measures.
Demerits:
- More complex to calculate and understand than other measures.
- Gives more weight to extreme values: Squaring the deviations makes outliers have a disproportionately large effect on the final value.
6. Coefficient of Variation (CV)
The Coefficient of Variation is a relative measure of dispersion. It is defined as the ratio of the standard deviation to the mean, usually expressed as a percentage.
Its primary use is to compare the variability, consistency, or uniformity of two or more datasets, especially when their means are different or they are measured in different units.
Formula
CV = (Standard Deviation / Mean) * 100
CV = (σ / x̄) * 100
Interpretation
- Lower CV: Indicates greater consistency, uniformity, or stability in the data.
- Higher CV: Indicates greater variability, less consistency, or more dispersion.
Example Application in Business:
- Comparing Investments: An investment with a lower CV is considered less risky for the level of return it generates.
- Comparing Employee Performance: A salesperson with a lower CV in their monthly sales figures is more consistent than one with a higher CV, even if their average sales are similar.
- Quality Control: A manufacturing process with a lower CV produces more uniform products.
7. Skewness
Skewness is a measure of the asymmetry of a probability distribution about its mean. It describes the shape of the distribution.
- Symmetrical Distribution (Zero Skewness): The distribution is perfectly balanced on both sides of the center. The right and left tails are mirror images. In this case, Mean = Median = Mode.
- Positively Skewed Distribution (Right-Skewed): The tail on the right side of the distribution is longer or fatter than the left side. The bulk of the data is concentrated on the left. In this case, Mean > Median > Mode.
- Negatively Skewed Distribution (Left-Skewed): The tail on the left side is longer or fatter than the right side. The bulk of the data is concentrated on the right. In this case, Mean < Median < Mode.
Measures of Skewness
These are coefficients that give a numerical value to the degree and direction of skewness.
1. Karl Pearson's Coefficient of Skewness (Skp)
This measure is based on the relationship between the mean, median, and mode.
- Primary Formula (based on Mode):
TEXTSkp = (Mean - Mode) / Standard Deviation - Alternative Formula (based on Median): Used when the mode is ill-defined. It relies on the empirical relationship:
Mode ≈ 3 * Median - 2 * Mean.
TEXTSkp = 3 * (Mean - Median) / Standard Deviation
Interpretation of Skp:
Skp = 0: Symmetrical distribution.Skp > 0: Positively skewed distribution.Skp < 0: Negatively skewed distribution.- Generally, if
|Skp| > 1, the skewness is considered high. If0.5 < |Skp| < 1, it's moderate.
2. Bowley's Coefficient of Skewness (Skb)
This measure is based on quartiles and is useful for open-ended distributions or when outliers are present.
Skb = (Q3 + Q1 - 2 * Median) / (Q3 - Q1)
Interpretation of Skb:
- The value always lies between -1 and +1.
Skb = 0: Symmetrical distribution (Median is equidistant from Q1 and Q3).Skb > 0: Positively skewed distribution (Median is closer to Q1).Skb < 0: Negatively skewed distribution (Median is closer to Q3).