Unit 6 - Notes

INT423

Unit 6: Recommender Systems

1. Introduction to Recommender Systems

Recommender systems are algorithms designed to suggest relevant items to users (movies to watch, text to read, products to buy, or people to follow). They are critical in modern digital ecosystems to combat information overload by filtering vital information fragments out of large amounts of dynamically generated information.

Two main approaches dominate the field:

  1. Content-based filtering
  2. Collaborative filtering

2. Content-Based Filtering

Core Concept

Content-based filtering recommends items that are similar to those a user has liked in the past. It relies heavily on the properties (features) of the items and the preferences of the user.

Mechanism

  • Item Profile: Each item is represented by a feature vector . For a movie, features might include Genre (Action, Romance), Director, Actors, and Year.
  • User Profile: A profile of the user’s preferences is built based on the items they have rated highly. This is often represented as a vector (theta).
  • Prediction: The system predicts whether user will like item by calculating the similarity between the user profile and the item profile.

Mathematical Formulation

If we have:

  • : Feature vector for item .
  • : Parameter vector for user (representing preference).

The predicted rating for user on item is often calculated as the dot product:

Advantages

  • Independence: Does not need data from other users.
  • Transparency: Recommendations can be explained (e.g., "Recommended because you watched The Matrix").
  • New Items: Capable of recommending new items that have not yet been rated, provided they have metadata (features).

Disadvantages

  • Limited Novelty: Users are only recommended items similar to what they have seen before (the "filter bubble").
  • Feature Engineering: Requires meaningful feature extraction, which can be difficult for unstructured data like images or audio without deep learning.

3. Collaborative Filtering (CF) Algorithm

Core Concept

Collaborative Filtering relies on the assumption that people who agreed in the past will agree in the future, and that they will like similar kinds of items as they liked in the past. It focuses on interactions rather than item features.

The Algorithm (Matrix Factorization Approach)

In modern Machine Learning, CF is often implemented using Matrix Factorization (learning latent features). Unlike content-based filtering where we are given features , in CF we learn both the item features and user parameters simultaneously.

Notations:

  • : Number of users.
  • : Number of items.
  • : If user has rated item .
  • : The rating given by user to item .

Optimization Objective (Cost Function)

The goal is to minimize the squared error between the predicted rating and the actual rating, summed over all valid ratings, plus a regularization term to prevent overfitting.

Cost Function :

Learning Process (Gradient Descent)

We randomly initialize and . We then iteratively update them to minimize . This allows the algorithm to learn that Feature 1 represents "Action" and Feature 2 represents "Romance" without being explicitly told, based solely on rating patterns.


4. Binary Labels: Favs, Likes, and Clicks

Not all recommendation data comes in the form of explicit 1-5 star ratings. Much of the web operates on binary or implicit data.

Implicit vs. Explicit Feedback

  1. Explicit Feedback: Direct input from the user (e.g., Netflix "Thumbs Up/Down", Amazon 5-star rating).
  2. Implicit Feedback: Inferred behavior (e.g., Clicks, Purchase history, Page view duration, Song skip vs. listen).

Handling Binary Data (1 vs 0)

In binary scenarios:

  • 1 (Positive): A user engaged (clicked, liked, bought).
  • 0 (Unobserved): This is ambiguous. It could mean the user dislikes the item, or simply hasn't seen it yet.

Strategies for Binary Classification in CF:

  1. Logistic Regression Output: Instead of predicting a continuous number (rating), predict the probability that a user engages with an item ().
    • Sigmoid function applied to the dot product: .
  2. Negative Sampling: Since "0" classes dominate the dataset (a user clicks only a tiny fraction of available items), algorithms treat all interaction as positive samples and randomly select a small subset of non-interacted items as negative samples to train the model efficiently.
  3. Confidence Weights: Treat "clicks" as a rating of 1 but with low confidence, and "purchases" as a rating of 1 with high confidence.

5. Mean Normalization

The Cold Start Problem

Consider a new user who has not rated any movies.

  • In standard CF, the algorithm tries to minimize the error for this user. Since there are no ratings , the regularization term dominates.
  • The algorithm minimizes by setting .
  • Result: The predicted rating for any movie is 0. The user gets no recommendations (or meaningless ones).

The Solution: Mean Normalization

We normalize the data so that the average rating for every item is 0.

  1. Calculate Average: Compute the average rating for every item (ignoring missing ratings).
  2. Subtract Mean: Create a new dataset where .
  3. Train: Learn parameters and using .
  4. Predict: For the final prediction, add the mean back:

Outcome

For a new user with vector , the prediction becomes .
Therefore, a new user is recommended the items with the highest average ratings across the platform, which is a much more sensible starting point than predicting 0 for everything.


6. Collaborative Filtering vs. Content-Based Filtering

Feature Collaborative Filtering (CF) Content-Based Filtering (CB)
Data Source User-Item interactions (ratings, clicks). Item features (attributes) and User profiles.
Feature Learning Learns latent features automatically. Requires manual feature engineering.
Cold Start (Item) Problematic: Cannot recommend a new item until someone rates it. Good: Can recommend new items immediately based on attributes.
Cold Start (User) Problematic: Needs user history to find similarities. Problematic: Needs user history to build preference profile.
Serendipity High: Can recommend items outside user's usual scope (e.g., "People who like Sci-Fi also liked this Cooking show"). Low: Tends to recommend "more of the same."
Complexity computationally expensive with large matrices (). Generally computationally lighter; parallelizable.

7. Hybrid Recommender Systems

Hybrid systems combine Collaborative and Content-based methods to exploit the advantages of each and mitigate their specific disadvantages (specifically the Cold Start problem and Sparsity).

Common Hybrid Techniques:

  1. Weighted: The scores of several recommendation components are combined numerically (e.g., ).
  2. Switching: The system switches between recommendation techniques depending on the situation (e.g., use Content-based for new users, switch to Collaborative once they have enough ratings).
  3. Cascade: One recommender refines the recommendations given by another (e.g., CF generates a top-100 list, and CB re-ranks them based on specific user interests).
  4. Feature Augmentation: The output of one technique is used as an input feature for another. For example, using learned latent factors from CF as features in a Content-based model.

8. Recommendation Strategies and Use Cases

Strategies

  • Top-N Recommendations: Presenting a ranked list of the items the user is most likely to enjoy.
  • "More Like This" (Item-to-Item): When a user is viewing an item, suggest similar items (often calculated via pre-computed item similarity matrices).
  • Personalized Ranking: Re-ordering a standard feed based on user probabilities (e.g., Social Media feeds).
  • Diversity/Novelty: Deliberately injecting diverse items to prevent boredom and explore user interests.

Real-World Use Cases

1. E-Commerce (Amazon)

  • Goal: Conversion (Purchase).
  • Method: "Item-to-Item Collaborative Filtering." Because user bases are massive and change frequently, Amazon calculates similarity between items (which are static).
  • Label: Purchases, items in cart, recently viewed.

2. Video Streaming (Netflix/YouTube)

  • Goal: Retention and Watch Time.
  • Method: Deep Learning Hybrids. Uses watch history (CF) combined with video metadata and thumbnails (CB).
  • Label: Percentage of video watched, "Add to List."

3. Social Media (TikTok/Instagram)

  • Goal: Engagement (Time on App).
  • Method: Highly aggressive ranking algorithms focusing on implicit feedback.
  • Label: Loop rate (rewatching), shares, completion rate.

4. News Aggregation

  • Goal: Click-through rate (CTR).
  • Method: Content-based heavy (using NLP to analyze article text) combined with trending topics.
  • Challenge: The lifespan of a news item is very short, making pure CF difficult (by the time enough people read it to create a CF pattern, the news is old).