1

Explain the fundamental shift introduced by the Transformer architecture compared to traditional Recurrent Neural Networks (RNNs) for Sequence-to-Sequence tasks.

2

Derive and explain the mathematical formulation of Scaled Dot-Product Self-Attention.

3

What is Multi-Head Attention, and why is it preferred over a single attention function?

4

Describe the mathematical concept and necessity of Positional Encoding in the Transformer architecture.

5

Explain the structure of a single Transformer Encoder block in detail.

6

How does the Transformer Decoder block differ from the Encoder block, and what is the role of Masked Self-Attention?

7

Compare and contrast Byte-Pair Encoding (BPE) and WordPiece tokenization methods.

8

Explain the concept of Transfer Learning in NLP and how Pretrained Language Models utilize this paradigm.

9

Describe the architecture of BERT and elaborate on its Masked Language Modeling (MLM) pretraining objective.

10

What is Next Sentence Prediction (NSP) in BERT, and why was it introduced?

11

Discuss the architecture of GPT and its pretraining objective, Causal Language Modeling.

12

Explain the text-to-text framework introduced by the T5 model.

13

Outline the process of fine-tuning a pretrained BERT model for a Text Classification task.

14

How is fine-tuning adapted for Named Entity Recognition (NER) using a Transformer model?

15

Detail the architecture modifications and loss formulation for fine-tuning BERT on Extractive Question Answering tasks (e.g., SQuAD).

16

What is the HuggingFace Transformers library, and what core abstractions does it provide for NLP practitioners?

17

Distinguish between Encoder-only, Decoder-only, and Encoder-Decoder architectures in the context of pretrained language models, providing an example for each.

18

Explain the computational bottleneck of the self-attention mechanism regarding sequence length.

19

What role do Residual Connections and Layer Normalization play in the Transformer block?

20

Describe the position-wise Feed-Forward Network within a Transformer block and its significance.

Unit5 - Subjective Questions