1 $What is the primary concept behind Transfer Learning in the context of Question Answering (QA)?$

A.

Training a model from scratch on a small QA dataset

B.

Using a model pre-trained on a large corpus and fine-tuning it on a QA dataset

C.

Translating questions from one language to another before answering

D.

Using hard-coded rules to extract answers from text

2 $Which of the following is considered a State-Of-The-Art (SOTA) approach for Natural Language Processing tasks like Question Answering?$

A.

Hidden Markov Models

B.

Transformer-based models

C.

Support Vector Machines

D.

Naive Bayes Classifiers

3 $In the context of BERT, what does the acronym stand for?$

A.

Binary Encoded Recurrent Transformers

B.

Bidirectional Encoder Representations from Transformers

C.

Basic Entity Recognition Technique

D.

Bigram Encoding for Robust Text

4 $How does BERT typically approach the Question Answering task (specifically Extractive QA)?$

A.

It generates new text to answer the question

B.

It translates the question into a database query

C.

It predicts the start and end token indices of the answer span within the context passage

D.

It retrieves a whole document that might contain the answer

5 $What pre-training objective allows BERT to understand the relationship between two sentences, which is useful for QA contexts?$

A.

Masked Language Modeling (MLM)

B.

Next Sentence Prediction (NSP)

C.

Sequence-to-Sequence generation

D.

Locality Sensitive Hashing

6 $T5 distinguishes itself from BERT by using which unifying framework for all NLP tasks?$

A.

Text-to-Text

B.

Token-Classification

C.

Regression-Analysis

D.

Image-to-Text

7 $In the T5 model, how is a specific task (like Question Answering) triggered?$

A.

By changing the model architecture

B.

By using a specific task prefix in the input text

C.

By using a separate encoder for each task

D.

By manually adjusting the learning rate

8 $What architecture does T5 utilize?$

A.

Encoder-only (like BERT)

B.

Decoder-only (like GPT)

C.

Encoder-Decoder (standard Transformer)

D.

Recurrent Neural Network

9 $When building a Chatbot, what is the 'Context Window' challenge?$

A.

The model cannot process images

B.

The model has a limit on the amount of previous conversation history it can remember

C.

The model generates text too slowly

D.

The model cannot understand multiple languages

10 $What is 'Hallucination' in the context of Chatbots and QA models?$

A.

The model failing to produce any output

B.

The model generating factually incorrect information confidently

C.

The model crashing due to memory overflow

D.

The model copying the user's input exactly

11 $Which of the following is a major computational challenge faced by standard Transformer models like BERT?$

A.

Linear dependence on sequence length

B.

Quadratic memory and time complexity relative to sequence length

C.

Inability to handle numerical data

D.

Requirement of labeled data only

12 $The Reformer model is designed to address which specific limitation of the Transformer?$

A.

Low accuracy on short sentences

B.

Efficiency and memory usage on long sequences

C.

The inability to do translation

D.

The need for tokenization

13 $What technique does the Reformer use to approximate the attention mechanism efficiently?$

A.

Locality Sensitive Hashing (LSH)

B.

Global Average Pooling

C.

Recurrent connections

D.

Convolutional layers

14 $In a Reformer model, what is the purpose of Reversible Residual Layers?$

A.

To reverse the text direction for bidirectional context

B.

To allow the model to run on CPUs only

C.

To store activations only once, allowing recalculation during backpropagation to save memory

D.

To increase the number of parameters without increasing size

15 $Which special token is used in BERT to separate the question from the context passage?$

A.

[CLS]

B.

[MASK]

C.

[SEP]

D.

[PAD]

16 $In T5's pre-training strategy, what is 'span corruption'?$

A.

Deleting random words and asking the model to predict the sentiment

B.

Replacing spans of text with a unique sentinel token and training the model to reconstruct the missing span

C.

Corrupting the embeddings with noise

D.

Shuffling the order of sentences in a paragraph

17 $A Chatbot built using a Retrieval-Based model differs from a Generative model because:$

A.

It creates new sentences word-by-word

B.

It selects the best response from a predefined database of responses

C.

It uses voice recognition

D.

It requires no training data

18 $What is the 'Consistency' challenge in Chatbots?$

A.

The bot replying with the same answer to every question

B.

The bot contradicting its own previous statements or persona

C.

The bot using consistent grammar

D.

The bot consistently answering correctly

19 $Why is the SQuAD (Stanford Question Answering Dataset) important for QA models?$

A.

It provides the code for the models

B.

It is a standardized benchmark dataset for evaluating reading comprehension and QA systems

C.

It is a database of all possible questions in English

D.

It is used to translate questions

20 $In BERT, what represents the aggregate representation of the entire sequence, often used for classification?$

A.

The output embedding of the [SEP] token

B.

The output embedding of the [CLS] token

C.

The average of all token embeddings

D.

The embedding of the last token

21 $Which of the following is a solution to the 'Blandness' problem (generic responses like 'I don't know') in generative chatbots?$

A.

Decreasing the model size

B.

Adjusting the temperature or decoding strategy (e.g., Nucleus Sampling)

C.

Removing the attention mechanism

D.

Training on less data

22 $How does the Reformer model handle the issue of large embedding tables for vocabulary?$

A.

It removes the vocabulary

B.

It uses character-level embeddings only

C.

It uses very small vocabulary sizes

D.

This is not a specific focus of the Reformer (it focuses on attention and depth memory)

23 $What is 'Fine-Tuning' in the context of building a QA model with BERT?$

A.

Adjusting the hyperparameters of the model manually

B.

Updating the weights of a pre-trained BERT model using a specific QA dataset

C.

Cleaning the text data before input

D.

Designing the neural network architecture from scratch

24 $In T5, what dataset was primarily used for pre-training?$

A.

ImageNet

B.

C4 (Colossal Clean Crawled Corpus)

C.

SQuAD only

D.

IMDB Reviews

25 $What is a major advantage of the T5 'Text-to-Text' framework over BERT's approach?$

A.

It is faster to train

B.

It allows the same model and loss function to be used for generation, translation, and classification

C.

It requires no GPU

D.

It uses fewer parameters

26 $When training a chatbot, what is 'Teacher Forcing'?$

A.

A human teacher corrects the bot's answers

B.

Using the ground-truth previous token as input during training instead of the model's own generated output

C.

Forcing the model to stop training early

D.

Using a larger model to teach a smaller model

27 $Which challenge for Transformer models relates to the maximum number of tokens it can process at once?$

A.

Vanishing Gradient

B.

Sequence Length Limit

C.

Overfitting

D.

Bias variance tradeoff

28 $In the Reformer, what happens if two vectors fall into the same hash bucket during LSH?$

A.

They are deleted

B.

They are considered for attention calculation with each other

C.

They are merged into one vector

D.

They are moved to a different layer

29 $What is 'Abstractive' Question Answering?$

A.

Highlighting the answer in the text

B.

Generating an answer that may contain words not present in the context passage

C.

Ignoring the context passage

D.

Answering yes/no questions only

30 $Which evaluation metric is commonly used for measuring the overlap between a chatbot's generated response and a reference response?$

A.

Accuracy

B.

BLEU or ROUGE

C.

F1-Score (for classification)

D.

Mean Squared Error

31 $Does BERT process text sequentially from left-to-right?$

A.

Yes, strictly left-to-right

B.

No, it processes the entire sequence simultaneously (bidirectionally)

C.

Yes, but also right-to-left in separate layers

D.

No, it processes random words first

32 $What is the 'Safety' challenge in open-domain chatbots?$

A.

Preventing the model from being deleted

B.

Preventing the generation of toxic, biased, or offensive content

C.

Ensuring the model saves data securely

D.

Preventing users from hacking the server

33 $How does the Reformer save memory regarding the 'Q, K, V' matrices in Attention?$

A.

It eliminates the Value matrix

B.

It uses shared Query (Q) and Key (K) spaces

C.

It compresses them using JPEG

D.

It stores them on the hard drive

34 $When using BERT for QA, what does the model output for every token in the passage?$

A.

A probability of being the next word

B.

A probability score for being the 'Start' and 'End' of the answer

C.

A sentiment score

D.

A translation of the token

35 $Which model would be most appropriate for summarising a very long book into a short paragraph?$

A.

Standard BERT (limit 512 tokens)

B.

Reformer (or Longformer)

C.

A simple RNN

D.

Naive Bayes

36 $In the context of T5, what does 'Transfer Learning' specifically refer to?$

A.

Transferring files between computers

B.

Transferring the weights from a pre-trained general model to a specific downstream task

C.

Transferring data from training set to test set

D.

Transferring the style of one author to another

37 $What is a 'Persona-based' Chatbot?$

A.

A bot that asks for personal information

B.

A bot conditioned on a specific profile (e.g., 'I am a doctor') to improve consistency

C.

A bot that changes personality every turn

D.

A bot used only for personnel management

38 $Why is 'Masked Language Modeling' (MLM) harder than standard left-to-right language modeling?$

A.

It isn't; it is easier

B.

The model must deduce the missing word based on bidirectional context rather than just previous words

C.

It requires more labeled data

D.

It uses images

39 $What is the typical size of the vocabulary in models like BERT or T5?$

A.

26 letters

B.

Around 30,000 to 50,000 tokens (WordPieces/SentencePieces)

C.

1 Million words

D.

100 words

40 $In a Chatbot, 'Multi-turn' capability means:$

A.

The bot spins around

B.

The bot can handle a conversation with back-and-forth exchanges while maintaining context

C.

The bot can answer multiple questions at once

D.

The bot can speak multiple languages

41 $What is the primary trade-off when using a Reformer model with Reversible Layers?$

A.

Higher memory usage for faster speed

B.

Slightly higher computational cost (re-computing) for significantly lower memory usage

C.

Lower accuracy for higher memory

D.

It can only process numbers

42 $T5 typically uses which type of positional encoding?$

A.

Absolute sinusoidal embeddings

B.

Relative position embeddings

C.

No positional encoding

D.

GPS coordinates

43 $Which of these is NOT a standard challenge for Transformers?$

A.

Data hunger (need large datasets)

B.

Computational cost (GPU requirements)

C.

Inability to handle sequential data

D.

Interpretability (Black box nature)

44 $For a Chatbot, 'Slot Filling' refers to:$

A.

Filling the memory with data

B.

Extracting specific parameters (e.g., date, time, location) from user input

C.

The time slot when the bot is active

D.

Putting coins in a machine

45 $The 'Chunking' strategy in Reformer helps in:$

A.

Breaking words into letters

B.

Processing long sequences by breaking them into fixed-size chunks to apply LSH attention

C.

Deleting parts of the sentence

D.

Grouping users into chunks

46 $Which model introduced the 'Masked Language Model' concept?$

A.

T5

B.

BERT

C.

LSTM

D.

Reformer

47 $When fine-tuning T5 for QA, the target output is:$

A.

The index of the start and end words

B.

A '1' or '0' (Binary classification)

C.

The raw text string of the answer

D.

A vector representation

48 $Why might a Reformer be less suitable than BERT for short-sequence tasks?$

A.

Reformer cannot handle text

B.

The overhead of LSH and reversible layers adds complexity not needed for short sequences

C.

Reformer is only for images

D.

Reformer has low accuracy

49 $What is 'Zero-Shot' learning in the context of QA models?$

A.

The model answering questions without any specific fine-tuning on that QA dataset

B.

The model failing 0 times

C.

The model training with zero data

D.

The model taking zero seconds to reply

50 $In Chatbot development, what is the role of the 'Temperature' parameter during generation?$

A.

It controls the heat of the GPU

B.

It controls the randomness of predictions (Low = deterministic, High = creative/random)

C.

It sets the mood of the bot

D.

It determines the length of the sentence

Unit 6 - Practice Quiz