Unit 2 - Notes

INT234 6 min read

Unit 2: SUPERVISED LEARNING: REGRESSION

1. Introduction to Regression Analysis

Regression analysis is a fundamental supervised learning technique used for predictive modeling. Its primary goal is to investigate the relationship between a dependent variable (target) and one or more independent variables (predictors/features). In regression, the target variable is always continuous (numerical).

2. Simple Linear Regression (SLR)

Definition

Simple Linear Regression is the most basic form of regression analysis. It models the relationship between a single independent variable ( $X$ ) and a dependent variable ( $Y$ ) by fitting a straight line to the observed data.

The Equation

The mathematical representation of the population model is:

Y = \beta_0 + \beta_1X + \epsilon

Where:

$Y$ : The dependent variable (what we want to predict).
$X$ : The independent variable (input).
$\beta_0$ : The Y-intercept (the value of $Y$ when $X = 0$ ).
$\beta_1$ : The Slope (the change in $Y$ for a one-unit change in $X$ ).
$\epsilon$ : The Error term (residuals), representing the variability in $Y$ not explained by the linear relationship.

The Prediction Function

When the model is trained, we produce a prediction equation:

\hat{y} = b_0 + b_1x

$\hat{y}$ : The predicted value.
$b_0, b_1$ : The estimated coefficients derived from the data.

3. Ordinary Least Squares (OLS) Estimation

Concept

OLS is the optimization method used to estimate the unknown parameters ( $b_0$ and $b_1$ ) in a linear regression model. The goal of OLS is to find the line of "best fit."

Mechanism

The "best fit" line is defined as the line that minimizes the sum of the squared vertical differences (residuals) between the observed values and the predicted values.

The Cost Function (Sum of Squared Errors - SSE):

SSE = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2

SSE = \sum_{i=1}^{n} (y_i - (b_0 + b_1x_i))^2

OLS uses calculus (partial derivatives) to find the values of $b_0$ and $b_1$ that make SSE as small as possible.

Key Assumptions of OLS

For OLS to provide valid statistical inferences, the following assumptions must hold:

Linearity: The relationship between $X$ and $Y$ is linear.
Independence: Observations are independent of each other.
Homoscedasticity: The variance of residual terms is constant at every level of $X$ .
Normality: The error terms are normally distributed (important for hypothesis testing/confidence intervals, less critical for pure prediction).

4. Correlations

Pearson Correlation Coefficient ( $r$ )

Correlation measures the strength and direction of the linear relationship between two continuous variables.

Range: $-1 \leq r \leq +1$
$r = 1$ : Perfect positive linear relationship.
$r = -1$ : Perfect negative linear relationship.
$r = 0$ : No linear relationship.

Relationship to Regression

Correlation quantifies association; Regression quantifies prediction.
In Simple Linear Regression, the coefficient of determination ( $R^2$ ) is the square of the Pearson correlation coefficient ( $r^2$ ).
Warning: Correlation does not imply causation. A high correlation between $X$ and $Y$ does not mean $X$ causes $Y$ .

5. Multiple Linear Regression (MLR)

Definition

MLR extends simple linear regression to include two or more independent variables. It accounts for the fact that a dependent variable is often influenced by multiple factors simultaneously.

The Equation

Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_nX_n + \epsilon

Where:

$X_1, X_2, ..., X_n$ : Multiple distinct independent variables.
$\beta_1, \beta_2, ..., \beta_n$ : Partial regression coefficients. $\beta_i$ represents the change in $Y$ for a one-unit change in $X_i$ , holding all other variables constant.

Multicollinearity

A specific challenge in MLR is multicollinearity, which occurs when independent variables are highly correlated with each other.

Effect: It makes it difficult to determine the individual effect of each independent variable.
Detection: Variance Inflation Factor (VIF).

6. Polynomial Regression

Definition

Polynomial regression is a form of regression analysis used when the relationship between the independent and dependent variables is non-linear (curved).

The Concept

Although it models a non-linear relationship, it is considered a linear model in terms of estimation coefficients. We transform the input features by raising them to a power.

The Equation (Degree 2)

Y = \beta_0 + \beta_1X + \beta_2X^2 + \epsilon

Bias-Variance Tradeoff

Underfitting: Using a linear model (Degree 1) on curved data.
Overfitting: Using a very high degree polynomial (e.g., Degree 15) which passes through every data point but fails to generalize to new data.

7. Logistic Regression

Definition

Despite its name, Logistic Regression is a classification algorithm, not a regression algorithm in the traditional sense. It is used to predict a discrete outcome (e.g., Yes/No, 0/1, True/False).

Why "Regression"?

It is called regression because it estimates the probability of an event occurring using a regression-like formula, which is then mapped to a class.

The Sigmoid Function (Logistic Function)

Linear regression produces values from $-\infty$ to $+\infty$ . To map this to a probability (0 to 1), we wrap the linear equation in a Sigmoid function:

P(Y=1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1X)}}

S-Curve: The output forms an "S" shape.
Decision Boundary: A threshold (usually 0.5) is applied to the probability to classify the result.
- If $P \geq 0.5 \rightarrow$ Class 1
- If $P < 0.5 \rightarrow$ Class 0

8. Evaluate Model Performance

Evaluating how well a regression model predicts the target variable is crucial. We compare the Predicted values ( $\hat{y}$ ) against the Actual values ( $y$ ).

A. Mean Absolute Error (MAE)

The average of the absolute differences between predictions and actual values.

MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|

Interpretation: Represents the average magnitude of errors.
Pros: Robust to outliers compared to MSE.
Cons: Not differentiable at 0 (harder for some optimization algorithms).

B. Mean Squared Error (MSE)

The average of the squared differences between predictions and actual values.

MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2

Interpretation: Measures the variance of the residuals.
Pros: Heavily penalizes large errors (squaring makes big errors much bigger). Differentiable (good for gradient descent).
Cons: Not in the original unit of the target variable (e.g., if predicting dollars, MSE is "dollars squared").

C. Root Mean Squared Error (RMSE)

The square root of the MSE.

RMSE = \sqrt{MSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}

Interpretation: Represents the standard deviation of the residuals.
Pros: It is in the same unit as the target variable, making it highly interpretable. Like MSE, it penalizes large errors.

D. R-squared ( $R^2$ ) Score

Also known as the Coefficient of Determination. It represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s).

R^2 = 1 - \frac{SSR}{SST} = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2}

Where $\bar{y}$ is the mean of the observed data.
Range: Usually $0$ to $1$.
- $R^2 = 1$ : Model explains 100% of the variance (perfect fit).
- $R^2 = 0$ : Model explains none of the variance (equivalent to guessing the mean).
Limitation: $R^2$ always increases as you add more variables, even if they are irrelevant. (Adjusted $R^2$ is used to counter this in advanced MLR).

Summary Cheat Sheet

Metric	Formula Concept	Use Case
SLR	$y = mx + c$	One input, one output linear relationship.
MLR	$y = b_0 + b_1x_1 + b_2x_2...$	Multiple inputs, one output.
Logistic	Sigmoid Function	Binary Classification (0 or 1).
MAE	Average Absolute Error	When outliers shouldn't be penalized excessively.
RMSE	$\sqrt{Average Squared Error}$	Standard metric; penalizes large errors; interpretable units.
$R^2$	$1 - (Error / Variance)$	Goodness of fit (0 to 1).

Unit 1

Unit 3

Unit 2 - Notes

Table of Contents

Unit 2: SUPERVISED LEARNING: REGRESSION

1. Introduction to Regression Analysis

2. Simple Linear Regression (SLR)

Definition

The Equation

The Prediction Function

3. Ordinary Least Squares (OLS) Estimation

Concept

Mechanism

Key Assumptions of OLS

4. Correlations

Pearson Correlation Coefficient ()

Relationship to Regression

5. Multiple Linear Regression (MLR)

Definition

The Equation

Multicollinearity

6. Polynomial Regression

Definition

The Concept

The Equation (Degree 2)

Bias-Variance Tradeoff

7. Logistic Regression

Definition

Why "Regression"?

The Sigmoid Function (Logistic Function)

8. Evaluate Model Performance

A. Mean Absolute Error (MAE)

B. Mean Squared Error (MSE)

C. Root Mean Squared Error (RMSE)

D. R-squared () Score

Summary Cheat Sheet

Pearson Correlation Coefficient ( $r$ )

D. R-squared ( $R^2$ ) Score