Unit 6 - Notes
INT344
Unit 6: Building Models/ Case Studies
1. Question Answering: Transfer Learning with State-Of-The-Art Models
Question Answering (QA) is a sub-field of Information Retrieval and NLP concerned with building systems that automatically answer questions posed by humans in a natural language.
Transfer Learning in NLP
Before the transformer era, NLP models were trained from scratch for specific tasks. Transfer learning revolutionized this by allowing models to be pre-trained on massive datasets (like Wikipedia) to learn the structure of language, and then fine-tuned on smaller, task-specific datasets (like SQuAD - Stanford Question Answering Dataset).
Key Advantages:
- Reduced Training Time: Fine-tuning takes significantly less time than training from scratch.
- Performance on Low-Resource Data: Models can perform well even with limited labeled QA data because they already "understand" language.
- State-Of-The-Art (SOTA) Evolution:
- ELMo (2018): Contextualized word embeddings.
- BERT (2018): Bidirectional transformer; redefined SOTA for QA.
- RoBERTa/ALBERT: Optimized versions of BERT.
- T5/GPT-3: Generative models that formulate answers rather than just extracting them.
2. BERT and T5 for Question Answering
BERT (Bidirectional Encoder Representations from Transformers)
BERT is an Encoder-only transformer architecture. It is designed to pre-train deep bidirectional representations from unlabeled text.
- Mechanism for QA (Extractive QA):
- BERT treats QA as a span selection problem.
- Input:
[CLS] Question [SEP] Passage [SEP] - Output: Two vectors representing the probability of each token in the passage being the Start Position and the End Position of the answer.
- Pros: Highly accurate for factoid questions where the answer exists verbatim in the text.
- Cons: Cannot generate answers that are not explicitly present in the context.
T5 (Text-to-Text Transfer Transformer)
T5 allows the use of the same model, loss function, and hyperparameters across all NLP tasks by treating every problem as a text-to-text problem.
- Mechanism for QA (Generative QA):
- T5 uses an Encoder-Decoder architecture.
- Input:
question: What is the capital of France? context: France is a country in Europe... - Output:
Paris(Generated token by token). - Pros: Can generate abstractive answers; handles boolean (True/False) questions easily; unified framework.
- Cons: Slower inference time due to auto-regressive decoding; risk of hallucination (generating plausible but incorrect facts).
3. Model for Answering Questions (Architecture Design)
Building a complete QA system usually involves more than just a language model. The standard architecture for Open-Domain QA is the Retriever-Reader pipeline.
The Retriever-Reader Pipeline
-
The Retriever (Document Selection):
- Purpose: Scans a massive knowledge base (e.g., all of Wikipedia) to find relevant documents.
- Traditional Method: TF-IDF / BM25 (Keyword matching).
- Modern Method: Dense Passage Retrieval (DPR). Uses a dual-encoder architecture to embed questions and documents into the same vector space. Relevance is calculated via dot product (cosine similarity).
-
The Reader (Answer Extraction/Generation):
- Purpose: Processes the documents found by the Retriever to find the specific answer.
- Model: BERT (for extractive) or T5/BART (for generative).
- Process: The Reader takes the top documents + the Question and outputs the final answer.
Conceptual Code Snippet (Hugging Face Transformers)
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
model_name = "deepset/roberta-base-squad2"
# Load pipeline
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
'question': 'Why is model conversion important?',
'context': 'Model conversion allows using models trained in PyTorch with TensorFlow.'
}
res = nlp(QA_input)
print(res)
# Output: {'score': 0.98, 'start': 0, 'end': 15, 'answer': 'Model conversion'}
4. Chatbots: Unique Challenges
While QA systems answer single queries, chatbots must maintain a continuous dialogue. This introduces specific challenges that standard transformer models struggle with.
Key Challenges
- Long-Term Context Memory:
- Standard Transformers have a fixed context window (usually 512 or 1024 tokens).
- As the conversation grows, early parts of the chat are truncated, causing the bot to "forget" the user's name or initial intent.
- Consistency and Persona:
- Models often contradict themselves (e.g., saying "I live in New York" then "I live in Paris" later).
- Lack of a consistent personality profile.
- Generic Responses:
- Models tend to play it safe to minimize loss, resulting in dull responses like "I don't know" or "That's interesting."
- Evaluation Metrics:
- Standard metrics like BLEU (used for translation) correlate poorly with human judgment of conversation quality. A chatbot can answer correctly but rudely or vaguely.
5. Transformer Models: Challenges and Solutions
The standard Transformer architecture (like BERT/GPT-2) faces computational limits when applied to long sequences (like long chat logs or books).
The Quadratic Complexity Problem
The core issue is the Self-Attention Mechanism.
- For a sequence of length , every token attends to every other token.
- This results in an complexity for both time and memory.
- Example: Doubling the context length quadruples the memory usage. This makes processing long chat histories prohibitively expensive.
Solutions and Optimizations
To address the bottleneck, several variations have been proposed:
- Sparse Attention (e.g., Longformer, BigBird):
- Instead of attending to all tokens, tokens only attend to a local window of neighbors and a few global tokens.
- Reduces complexity to .
- Recurrence (e.g., Transformer-XL):
- Caches hidden states from previous segments to preserve long-term dependencies without recomputing.
- Low-Rank Factorization (e.g., Linformer):
- Approximates the attention matrix using lower-rank matrices.
- Hashing (e.g., Reformer):
- Uses Locality Sensitive Hashing (LSH) to approximate attention.
6. Chatbot using a Reformer Model
The Reformer (introduced by Google Research) is known as the "Efficient Transformer." It is specifically designed to handle very long context windows (up to 64,000 tokens) on a single GPU, making it ideal for maintaining long conversational contexts in chatbots.
Key Innovations of the Reformer
1. Locality Sensitive Hashing (LSH) Attention
- Problem: In standard attention, we compute for every pair. Most pairs result in a low score (irrelevant).
- Solution: LSH groups vectors that are similar into "buckets" using hash functions.
- Mechanism: The model only computes attention between items falling in the same bucket (similar items).
- Impact: Changes complexity from to .
2. Reversible Residual Layers (RevNets)
- Problem: To perform backpropagation, standard models must store the activations (values) of every layer in memory. For deep models, this consumes massive RAM.
- Solution: In a reversible network, the input to a layer can be calculated from its output.
- (Standard) vs Reversible Architecture.
- Impact: The model does not need to store activations for all layers. It recomputes them on the fly during the backward pass. This trades a small amount of compute time for massive memory savings.
Implementation in Chatbots
Using a Reformer for a chatbot allows the system to feed the entire conversation history (thousands of turns) into the model without truncation.
Workflow:
- Input: Concatenate full User/Bot history.
- LSH Attention: Efficiently attends to relevant past parts of the conversation (e.g., recalling the user's name mentioned 500 turns ago).
- Generation: Produces a response that is contextually aware of the entire interaction, solving the "Consistency and Memory" challenge.
Summary Table: Transformer vs. Reformer
| Feature | Standard Transformer | Reformer |
|---|---|---|
| Attention Complexity | (Quadratic) | (Log-linear) |
| Memory Usage | High (stores all activations) | Low (Reversible layers) |
| Max Context Length | ~512 - 2,048 tokens | ~64,000+ tokens |
| Best Use Case | Short QA, Sentence classification | Long documents, Long chat history |