1What is the primary concept behind Transfer Learning in the context of Question Answering (QA)?
A.Training a model from scratch on a small QA dataset
B.Using a model pre-trained on a large corpus and fine-tuning it on a QA dataset
C.Translating questions from one language to another before answering
D.Using hard-coded rules to extract answers from text
Correct Answer: Using a model pre-trained on a large corpus and fine-tuning it on a QA dataset
Explanation:Transfer learning involves taking knowledge gained from solving one problem (pre-training on a large corpus) and applying it to a different but related problem (fine-tuning on specific QA data).
Incorrect! Try again.
2Which of the following is considered a State-Of-The-Art (SOTA) approach for Natural Language Processing tasks like Question Answering?
A.Hidden Markov Models
B.Transformer-based models
C.Support Vector Machines
D.Naive Bayes Classifiers
Correct Answer: Transformer-based models
Explanation:Transformer-based models, such as BERT, T5, and GPT, currently achieve State-Of-The-Art results in most NLP tasks, including Question Answering.
Incorrect! Try again.
3In the context of BERT, what does the acronym stand for?
A.Binary Encoded Recurrent Transformers
B.Bidirectional Encoder Representations from Transformers
C.Basic Entity Recognition Technique
D.Bigram Encoding for Robust Text
Correct Answer: Bidirectional Encoder Representations from Transformers
Explanation:BERT stands for Bidirectional Encoder Representations from Transformers, highlighting its architecture and bidirectional nature.
Incorrect! Try again.
4How does BERT typically approach the Question Answering task (specifically Extractive QA)?
A.It generates new text to answer the question
B.It translates the question into a database query
C.It predicts the start and end token indices of the answer span within the context passage
D.It retrieves a whole document that might contain the answer
Correct Answer: It predicts the start and end token indices of the answer span within the context passage
Explanation:For extractive QA, BERT is trained to identify the specific span of text in the provided passage that answers the question by predicting the start and end positions.
Incorrect! Try again.
5What pre-training objective allows BERT to understand the relationship between two sentences, which is useful for QA contexts?
A.Masked Language Modeling (MLM)
B.Next Sentence Prediction (NSP)
C.Sequence-to-Sequence generation
D.Locality Sensitive Hashing
Correct Answer: Next Sentence Prediction (NSP)
Explanation:Next Sentence Prediction (NSP) trains BERT to predict whether a second sentence follows the first, helping the model understand context and relationships between question and passage.
Incorrect! Try again.
6T5 distinguishes itself from BERT by using which unifying framework for all NLP tasks?
A.Text-to-Text
B.Token-Classification
C.Regression-Analysis
D.Image-to-Text
Correct Answer: Text-to-Text
Explanation:T5 (Text-to-Text Transfer Transformer) converts every NLP problem, including translation, classification, and QA, into a text-to-text format where both input and output are text strings.
Incorrect! Try again.
7In the T5 model, how is a specific task (like Question Answering) triggered?
A.By changing the model architecture
B.By using a specific task prefix in the input text
C.By using a separate encoder for each task
D.By manually adjusting the learning rate
Correct Answer: By using a specific task prefix in the input text
Explanation:T5 uses prefixes like 'question:', 'translate English to German:', or 'summarize:' added to the input to tell the model which task to perform.
Explanation:T5 uses the standard Encoder-Decoder Transformer architecture, unlike BERT (Encoder-only) or GPT (Decoder-only).
Incorrect! Try again.
9When building a Chatbot, what is the 'Context Window' challenge?
A.The model cannot process images
B.The model has a limit on the amount of previous conversation history it can remember
C.The model generates text too slowly
D.The model cannot understand multiple languages
Correct Answer: The model has a limit on the amount of previous conversation history it can remember
Explanation:Transformers have a fixed maximum sequence length (context window). If a conversation exceeds this, the model 'forgets' the earliest parts of the dialogue.
Incorrect! Try again.
10What is 'Hallucination' in the context of Chatbots and QA models?
A.The model failing to produce any output
B.The model generating factually incorrect information confidently
C.The model crashing due to memory overflow
D.The model copying the user's input exactly
Correct Answer: The model generating factually incorrect information confidently
Explanation:Hallucination refers to the phenomenon where a generative model produces text that is fluent and confident but factually wrong or nonsensical.
Incorrect! Try again.
11Which of the following is a major computational challenge faced by standard Transformer models like BERT?
A.Linear dependence on sequence length
B.Quadratic memory and time complexity relative to sequence length
C.Inability to handle numerical data
D.Requirement of labeled data only
Correct Answer: Quadratic memory and time complexity relative to sequence length
Explanation:The self-attention mechanism in standard Transformers requires O(N^2) time and memory, making it expensive to process very long sequences.
Incorrect! Try again.
12The Reformer model is designed to address which specific limitation of the Transformer?
A.Low accuracy on short sentences
B.Efficiency and memory usage on long sequences
C.The inability to do translation
D.The need for tokenization
Correct Answer: Efficiency and memory usage on long sequences
Explanation:The Reformer is specifically engineered to handle long context windows efficiently by reducing the memory footprint and computational complexity of the attention mechanism.
Incorrect! Try again.
13What technique does the Reformer use to approximate the attention mechanism efficiently?
A.Locality Sensitive Hashing (LSH)
B.Global Average Pooling
C.Recurrent connections
D.Convolutional layers
Correct Answer: Locality Sensitive Hashing (LSH)
Explanation:Reformer uses Locality Sensitive Hashing (LSH) attention to group similar vectors together, allowing it to compute attention only among relevant tokens, reducing complexity from O(N^2) to O(N log N).
Incorrect! Try again.
14In a Reformer model, what is the purpose of Reversible Residual Layers?
A.To reverse the text direction for bidirectional context
B.To allow the model to run on CPUs only
C.To store activations only once, allowing recalculation during backpropagation to save memory
D.To increase the number of parameters without increasing size
Correct Answer: To store activations only once, allowing recalculation during backpropagation to save memory
Explanation:Reversible layers allow the activations of the previous layer to be computed from the next layer, meaning the model doesn't need to store all intermediate activations in memory for backpropagation.
Incorrect! Try again.
15Which special token is used in BERT to separate the question from the context passage?
A.[CLS]
B.[MASK]
C.[SEP]
D.[PAD]
Correct Answer: Array
Explanation:The [SEP] token is used in BERT to distinguish between sentence A (e.g., Question) and sentence B (e.g., Context/Passage).
Incorrect! Try again.
16In T5's pre-training strategy, what is 'span corruption'?
A.Deleting random words and asking the model to predict the sentiment
B.Replacing spans of text with a unique sentinel token and training the model to reconstruct the missing span
C.Corrupting the embeddings with noise
D.Shuffling the order of sentences in a paragraph
Correct Answer: Replacing spans of text with a unique sentinel token and training the model to reconstruct the missing span
Explanation:T5 is pre-trained by masking contiguous spans of text with sentinel tokens (e.g., <extra_id_0>) and asking the decoder to generate the text meant to fill those slots.
Incorrect! Try again.
17A Chatbot built using a Retrieval-Based model differs from a Generative model because:
A.It creates new sentences word-by-word
B.It selects the best response from a predefined database of responses
C.It uses voice recognition
D.It requires no training data
Correct Answer: It selects the best response from a predefined database of responses
Explanation:Retrieval-based chatbots pick an existing response from a database based on similarity, whereas generative models (like T5 or GPT) construct new responses token by token.
Incorrect! Try again.
18What is the 'Consistency' challenge in Chatbots?
A.The bot replying with the same answer to every question
B.The bot contradicting its own previous statements or persona
C.The bot using consistent grammar
D.The bot consistently answering correctly
Correct Answer: The bot contradicting its own previous statements or persona
Explanation:Consistency refers to the bot maintaining a stable persona (e.g., age, name) and not contradicting facts it stated earlier in the conversation.
Incorrect! Try again.
19Why is the SQuAD (Stanford Question Answering Dataset) important for QA models?
A.It provides the code for the models
B.It is a standardized benchmark dataset for evaluating reading comprehension and QA systems
C.It is a database of all possible questions in English
D.It is used to translate questions
Correct Answer: It is a standardized benchmark dataset for evaluating reading comprehension and QA systems
Explanation:SQuAD is a widely used benchmark dataset consisting of questions posed on Wikipedia articles, used to train and evaluate Extractive QA models.
Incorrect! Try again.
20In BERT, what represents the aggregate representation of the entire sequence, often used for classification?
A.The output embedding of the [SEP] token
B.The output embedding of the [CLS] token
C.The average of all token embeddings
D.The embedding of the last token
Correct Answer: The output embedding of the [CLS] token
Explanation:The [CLS] token is added to the start of every sequence in BERT, and its final hidden state is used as the aggregate representation for classification tasks.
Incorrect! Try again.
21Which of the following is a solution to the 'Blandness' problem (generic responses like 'I don't know') in generative chatbots?
A.Decreasing the model size
B.Adjusting the temperature or decoding strategy (e.g., Nucleus Sampling)
C.Removing the attention mechanism
D.Training on less data
Correct Answer: Adjusting the temperature or decoding strategy (e.g., Nucleus Sampling)
Explanation:Beam search often leads to repetitive/bland text. Techniques like Top-K or Nucleus (Top-P) sampling, and adjusting temperature, encourage more diverse and interesting responses.
Incorrect! Try again.
22How does the Reformer model handle the issue of large embedding tables for vocabulary?
A.It removes the vocabulary
B.It uses character-level embeddings only
C.It uses very small vocabulary sizes
D.This is not a specific focus of the Reformer (it focuses on attention and depth memory)
Correct Answer: This is not a specific focus of the Reformer (it focuses on attention and depth memory)
Explanation:While some models use specific embedding tricks, the Reformer's primary innovations are LSH Attention and Reversible Layers to solve the sequence length and depth memory bottlenecks.
Incorrect! Try again.
23What is 'Fine-Tuning' in the context of building a QA model with BERT?
A.Adjusting the hyperparameters of the model manually
B.Updating the weights of a pre-trained BERT model using a specific QA dataset
C.Cleaning the text data before input
D.Designing the neural network architecture from scratch
Correct Answer: Updating the weights of a pre-trained BERT model using a specific QA dataset
Explanation:Fine-tuning involves taking the pre-trained weights (which know language structure) and continuing the training process on the specific QA data to specialize the model.
Incorrect! Try again.
24In T5, what dataset was primarily used for pre-training?
Explanation:T5 was pre-trained on the C4 dataset, a massive, cleaned version of the Common Crawl web scrape.
Incorrect! Try again.
25What is a major advantage of the T5 'Text-to-Text' framework over BERT's approach?
A.It is faster to train
B.It allows the same model and loss function to be used for generation, translation, and classification
C.It requires no GPU
D.It uses fewer parameters
Correct Answer: It allows the same model and loss function to be used for generation, translation, and classification
Explanation:By treating everything as text generation, T5 simplifies the modeling pipeline, allowing a single model architecture and objective to handle diverse tasks without task-specific heads.
Incorrect! Try again.
26When training a chatbot, what is 'Teacher Forcing'?
A.A human teacher corrects the bot's answers
B.Using the ground-truth previous token as input during training instead of the model's own generated output
C.Forcing the model to stop training early
D.Using a larger model to teach a smaller model
Correct Answer: Using the ground-truth previous token as input during training instead of the model's own generated output
Explanation:Teacher forcing speeds up training by feeding the correct previous token to the decoder at each step, rather than the token the model actually predicted.
Incorrect! Try again.
27Which challenge for Transformer models relates to the maximum number of tokens it can process at once?
A.Vanishing Gradient
B.Sequence Length Limit
C.Overfitting
D.Bias variance tradeoff
Correct Answer: Sequence Length Limit
Explanation:Standard Transformers (like BERT) usually have a hard limit (e.g., 512 tokens) due to the positional encoding design and the quadratic cost of attention.
Incorrect! Try again.
28In the Reformer, what happens if two vectors fall into the same hash bucket during LSH?
A.They are deleted
B.They are considered for attention calculation with each other
C.They are merged into one vector
D.They are moved to a different layer
Correct Answer: They are considered for attention calculation with each other
Explanation:LSH sorts vectors into buckets based on similarity. Only vectors within the same bucket (or adjacent chunks) attend to each other, approximating the full attention mechanism.
Incorrect! Try again.
29What is 'Abstractive' Question Answering?
A.Highlighting the answer in the text
B.Generating an answer that may contain words not present in the context passage
C.Ignoring the context passage
D.Answering yes/no questions only
Correct Answer: Generating an answer that may contain words not present in the context passage
Explanation:Abstractive QA involves understanding the context and generating a summary or answer in natural language, rephrasing content rather than just extracting a direct span.
Incorrect! Try again.
30Which evaluation metric is commonly used for measuring the overlap between a chatbot's generated response and a reference response?
A.Accuracy
B.BLEU or ROUGE
C.F1-Score (for classification)
D.Mean Squared Error
Correct Answer: BLEU or ROUGE
Explanation:BLEU and ROUGE are metrics based on n-gram overlap between the generated text and reference text, commonly used in translation and summarization/chatbots.
Incorrect! Try again.
31Does BERT process text sequentially from left-to-right?
A.Yes, strictly left-to-right
B.No, it processes the entire sequence simultaneously (bidirectionally)
C.Yes, but also right-to-left in separate layers
D.No, it processes random words first
Correct Answer: No, it processes the entire sequence simultaneously (bidirectionally)
Explanation:Unlike RNNs or GPT (left-to-right), BERT's Transformer encoder reads the entire sequence at once, allowing it to learn context from both sides of a token.
Incorrect! Try again.
32What is the 'Safety' challenge in open-domain chatbots?
A.Preventing the model from being deleted
B.Preventing the generation of toxic, biased, or offensive content
C.Ensuring the model saves data securely
D.Preventing users from hacking the server
Correct Answer: Preventing the generation of toxic, biased, or offensive content
Explanation:Since chatbots are trained on internet data, a major challenge is ensuring they do not output hate speech, bias, or harmful instructions.
Incorrect! Try again.
33How does the Reformer save memory regarding the 'Q, K, V' matrices in Attention?
A.It eliminates the Value matrix
B.It uses shared Query (Q) and Key (K) spaces
C.It compresses them using JPEG
D.It stores them on the hard drive
Correct Answer: It uses shared Query (Q) and Key (K) spaces
Explanation:The Reformer simplifies attention by assuming Q = K (shared keys and queries), reducing the number of matrices needed and facilitating the LSH scheme.
Incorrect! Try again.
34When using BERT for QA, what does the model output for every token in the passage?
A.A probability of being the next word
B.A probability score for being the 'Start' and 'End' of the answer
C.A sentiment score
D.A translation of the token
Correct Answer: A probability score for being the 'Start' and 'End' of the answer
Explanation:BERT outputs two logits for each token: one representing the likelihood of it being the start of the answer span, and one for being the end.
Incorrect! Try again.
35Which model would be most appropriate for summarising a very long book into a short paragraph?
A.Standard BERT (limit 512 tokens)
B.Reformer (or Longformer)
C.A simple RNN
D.Naive Bayes
Correct Answer: Reformer (or Longformer)
Explanation:Because a book far exceeds the token limit of standard transformers, a Reformer (which handles long sequences efficiently) is the most appropriate choice.
Incorrect! Try again.
36In the context of T5, what does 'Transfer Learning' specifically refer to?
A.Transferring files between computers
B.Transferring the weights from a pre-trained general model to a specific downstream task
C.Transferring data from training set to test set
D.Transferring the style of one author to another
Correct Answer: Transferring the weights from a pre-trained general model to a specific downstream task
Explanation:It refers to the methodology of pre-training on a generic task (unsupervised) and then fine-tuning on a specific task (supervised) like QA.
Incorrect! Try again.
37What is a 'Persona-based' Chatbot?
A.A bot that asks for personal information
B.A bot conditioned on a specific profile (e.g., 'I am a doctor') to improve consistency
C.A bot that changes personality every turn
D.A bot used only for personnel management
Correct Answer: A bot conditioned on a specific profile (e.g., 'I am a doctor') to improve consistency
Explanation:To solve consistency issues, bots are often conditioned on a 'persona' profile provided in the input context.
Incorrect! Try again.
38Why is 'Masked Language Modeling' (MLM) harder than standard left-to-right language modeling?
A.It isn't; it is easier
B.The model must deduce the missing word based on bidirectional context rather than just previous words
C.It requires more labeled data
D.It uses images
Correct Answer: The model must deduce the missing word based on bidirectional context rather than just previous words
Explanation:MLM requires the model to understand the full context (words before and after) to fill in the blank, creating deeper contextual representations.
Incorrect! Try again.
39What is the typical size of the vocabulary in models like BERT or T5?
A.26 letters
B.Around 30,000 to 50,000 tokens (WordPieces/SentencePieces)
C.1 Million words
D.100 words
Correct Answer: Around 30,000 to 50,000 tokens (WordPieces/SentencePieces)
Explanation:These models use sub-word tokenization algorithms (like WordPiece for BERT) resulting in vocabulary sizes usually between 30k and 50k.
Incorrect! Try again.
40In a Chatbot, 'Multi-turn' capability means:
A.The bot spins around
B.The bot can handle a conversation with back-and-forth exchanges while maintaining context
C.The bot can answer multiple questions at once
D.The bot can speak multiple languages
Correct Answer: The bot can handle a conversation with back-and-forth exchanges while maintaining context
Explanation:Multi-turn dialogue systems maintain state across several user-system exchanges, unlike single-turn systems (like simple FAQs).
Incorrect! Try again.
41What is the primary trade-off when using a Reformer model with Reversible Layers?
A.Higher memory usage for faster speed
B.Slightly higher computational cost (re-computing) for significantly lower memory usage
Explanation:Reversible layers require re-computing the forward pass during the backward pass. This trades a small increase in compute time for a massive reduction in memory.
Incorrect! Try again.
42T5 typically uses which type of positional encoding?
A.Absolute sinusoidal embeddings
B.Relative position embeddings
C.No positional encoding
D.GPS coordinates
Correct Answer: Relative position embeddings
Explanation:T5 uses relative position embeddings, where the encoding depends on the offset between the key and query, rather than their absolute position in the sequence.
Incorrect! Try again.
43Which of these is NOT a standard challenge for Transformers?
A.Data hunger (need large datasets)
B.Computational cost (GPU requirements)
C.Inability to handle sequential data
D.Interpretability (Black box nature)
Correct Answer: Inability to handle sequential data
Explanation:Transformers are explicitly designed to handle sequential data (text). The other options are valid challenges.
Incorrect! Try again.
44For a Chatbot, 'Slot Filling' refers to:
A.Filling the memory with data
B.Extracting specific parameters (e.g., date, time, location) from user input
C.The time slot when the bot is active
D.Putting coins in a machine
Correct Answer: Extracting specific parameters (e.g., date, time, location) from user input
Explanation:In task-oriented chatbots, slot filling is the process of identifying specific details required to fulfill a user's intent (e.g., extracting 'tomorrow' for a 'Date' slot).
Incorrect! Try again.
45The 'Chunking' strategy in Reformer helps in:
A.Breaking words into letters
B.Processing long sequences by breaking them into fixed-size chunks to apply LSH attention
C.Deleting parts of the sentence
D.Grouping users into chunks
Correct Answer: Processing long sequences by breaking them into fixed-size chunks to apply LSH attention
Explanation:To make LSH efficient, the Reformer chunks the sequence and sorts vectors within/across chunks to compute attention locally.
Incorrect! Try again.
46Which model introduced the 'Masked Language Model' concept?
A.T5
B.BERT
C.LSTM
D.Reformer
Correct Answer: BERT
Explanation:While the concept existed in literature (Cloze task), BERT popularized Masked Language Modeling (MLM) as the core pre-training objective for Transformers.
Incorrect! Try again.
47When fine-tuning T5 for QA, the target output is:
A.The index of the start and end words
B.A '1' or '0' (Binary classification)
C.The raw text string of the answer
D.A vector representation
Correct Answer: The raw text string of the answer
Explanation:Since T5 is text-to-text, the target label for QA is simply the text of the correct answer.
Incorrect! Try again.
48Why might a Reformer be less suitable than BERT for short-sequence tasks?
A.Reformer cannot handle text
B.The overhead of LSH and reversible layers adds complexity not needed for short sequences
C.Reformer is only for images
D.Reformer has low accuracy
Correct Answer: The overhead of LSH and reversible layers adds complexity not needed for short sequences
Explanation:Reformer's optimizations (LSH, Reversibility) are designed for long sequences. For short sequences, standard Attention is efficient enough, and Reformer might introduce unnecessary overhead.
Incorrect! Try again.
49What is 'Zero-Shot' learning in the context of QA models?
A.The model answering questions without any specific fine-tuning on that QA dataset
B.The model failing 0 times
C.The model training with zero data
D.The model taking zero seconds to reply
Correct Answer: The model answering questions without any specific fine-tuning on that QA dataset
Explanation:Zero-shot learning refers to a pre-trained model performing a task (like QA) without having seen any explicit examples of that task during training/fine-tuning.
Incorrect! Try again.
50In Chatbot development, what is the role of the 'Temperature' parameter during generation?
A.It controls the heat of the GPU
B.It controls the randomness of predictions (Low = deterministic, High = creative/random)
C.It sets the mood of the bot
D.It determines the length of the sentence
Correct Answer: It controls the randomness of predictions (Low = deterministic, High = creative/random)
Explanation:Temperature scales the logits before the softmax. Lower temperature makes the distribution sharper (more confident/repetitive), while higher temperature flattens it (more diverse/random).
Incorrect! Try again.
Give Feedback
Help us improve by sharing your thoughts or reporting issues.