Unit 6 - Practice Quiz

INT344 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the primary concept behind Transfer Learning in the context of Question Answering (QA)?

A. Translating questions from one language to another before answering
B. Using a model pre-trained on a large corpus and fine-tuning it on a QA dataset
C. Training a model from scratch on a small QA dataset
D. Using hard-coded rules to extract answers from text

2 Which of the following is considered a State-Of-The-Art (SOTA) approach for Natural Language Processing tasks like Question Answering?

A. Support Vector Machines
B. Transformer-based models
C. Naive Bayes Classifiers
D. Hidden Markov Models

3 In the context of BERT, what does the acronym stand for?

A. Binary Encoded Recurrent Transformers
B. Bigram Encoding for Robust Text
C. Basic Entity Recognition Technique
D. Bidirectional Encoder Representations from Transformers

4 How does BERT typically approach the Question Answering task (specifically Extractive QA)?

A. It translates the question into a database query
B. It retrieves a whole document that might contain the answer
C. It predicts the start and end token indices of the answer span within the context passage
D. It generates new text to answer the question

5 What pre-training objective allows BERT to understand the relationship between two sentences, which is useful for QA contexts?

A. Sequence-to-Sequence generation
B. Next Sentence Prediction (NSP)
C. Locality Sensitive Hashing
D. Masked Language Modeling (MLM)

6 T5 distinguishes itself from BERT by using which unifying framework for all NLP tasks?

A. Image-to-Text
B. Text-to-Text
C. Token-Classification
D. Regression-Analysis

7 In the T5 model, how is a specific task (like Question Answering) triggered?

A. By changing the model architecture
B. By using a specific task prefix in the input text
C. By manually adjusting the learning rate
D. By using a separate encoder for each task

8 What architecture does T5 utilize?

A. Recurrent Neural Network
B. Encoder-only (like BERT)
C. Encoder-Decoder (standard Transformer)
D. Decoder-only (like GPT)

9 When building a Chatbot, what is the 'Context Window' challenge?

A. The model has a limit on the amount of previous conversation history it can remember
B. The model generates text too slowly
C. The model cannot understand multiple languages
D. The model cannot process images

10 What is 'Hallucination' in the context of Chatbots and QA models?

A. The model crashing due to memory overflow
B. The model copying the user's input exactly
C. The model failing to produce any output
D. The model generating factually incorrect information confidently

11 Which of the following is a major computational challenge faced by standard Transformer models like BERT?

A. Requirement of labeled data only
B. Quadratic memory and time complexity relative to sequence length
C. Inability to handle numerical data
D. Linear dependence on sequence length

12 The Reformer model is designed to address which specific limitation of the Transformer?

A. The inability to do translation
B. Low accuracy on short sentences
C. The need for tokenization
D. Efficiency and memory usage on long sequences

13 What technique does the Reformer use to approximate the attention mechanism efficiently?

A. Recurrent connections
B. Locality Sensitive Hashing (LSH)
C. Global Average Pooling
D. Convolutional layers

14 In a Reformer model, what is the purpose of Reversible Residual Layers?

A. To increase the number of parameters without increasing size
B. To reverse the text direction for bidirectional context
C. To allow the model to run on CPUs only
D. To store activations only once, allowing recalculation during backpropagation to save memory

15 Which special token is used in BERT to separate the question from the context passage?

A. [CLS]
B. [PAD]
C. [MASK]
D. [SEP]

16 In T5's pre-training strategy, what is 'span corruption'?

A. Corrupting the embeddings with noise
B. Shuffling the order of sentences in a paragraph
C. Deleting random words and asking the model to predict the sentiment
D. Replacing spans of text with a unique sentinel token and training the model to reconstruct the missing span

17 A Chatbot built using a Retrieval-Based model differs from a Generative model because:

A. It uses voice recognition
B. It selects the best response from a predefined database of responses
C. It requires no training data
D. It creates new sentences word-by-word

18 What is the 'Consistency' challenge in Chatbots?

A. The bot contradicting its own previous statements or persona
B. The bot using consistent grammar
C. The bot consistently answering correctly
D. The bot replying with the same answer to every question

19 Why is the SQuAD (Stanford Question Answering Dataset) important for QA models?

A. It is a standardized benchmark dataset for evaluating reading comprehension and QA systems
B. It is used to translate questions
C. It is a database of all possible questions in English
D. It provides the code for the models

20 In BERT, what represents the aggregate representation of the entire sequence, often used for classification?

A. The average of all token embeddings
B. The embedding of the last token
C. The output embedding of the [SEP] token
D. The output embedding of the [CLS] token

21 Which of the following is a solution to the 'Blandness' problem (generic responses like 'I don't know') in generative chatbots?

A. Removing the attention mechanism
B. Decreasing the model size
C. Adjusting the temperature or decoding strategy (e.g., Nucleus Sampling)
D. Training on less data

22 How does the Reformer model handle the issue of large embedding tables for vocabulary?

A. It uses character-level embeddings only
B. It removes the vocabulary
C. It uses very small vocabulary sizes
D. This is not a specific focus of the Reformer (it focuses on attention and depth memory)

23 What is 'Fine-Tuning' in the context of building a QA model with BERT?

A. Cleaning the text data before input
B. Updating the weights of a pre-trained BERT model using a specific QA dataset
C. Adjusting the hyperparameters of the model manually
D. Designing the neural network architecture from scratch

24 In T5, what dataset was primarily used for pre-training?

A. ImageNet
B. SQuAD only
C. C4 (Colossal Clean Crawled Corpus)
D. IMDB Reviews

25 What is a major advantage of the T5 'Text-to-Text' framework over BERT's approach?

A. It is faster to train
B. It allows the same model and loss function to be used for generation, translation, and classification
C. It requires no GPU
D. It uses fewer parameters

26 When training a chatbot, what is 'Teacher Forcing'?

A. A human teacher corrects the bot's answers
B. Using the ground-truth previous token as input during training instead of the model's own generated output
C. Forcing the model to stop training early
D. Using a larger model to teach a smaller model

27 Which challenge for Transformer models relates to the maximum number of tokens it can process at once?

A. Vanishing Gradient
B. Overfitting
C. Sequence Length Limit
D. Bias variance tradeoff

28 In the Reformer, what happens if two vectors fall into the same hash bucket during LSH?

A. They are merged into one vector
B. They are moved to a different layer
C. They are considered for attention calculation with each other
D. They are deleted

29 What is 'Abstractive' Question Answering?

A. Highlighting the answer in the text
B. Generating an answer that may contain words not present in the context passage
C. Answering yes/no questions only
D. Ignoring the context passage

30 Which evaluation metric is commonly used for measuring the overlap between a chatbot's generated response and a reference response?

A. F1-Score (for classification)
B. Accuracy
C. Mean Squared Error
D. BLEU or ROUGE

31 Does BERT process text sequentially from left-to-right?

A. No, it processes random words first
B. Yes, strictly left-to-right
C. Yes, but also right-to-left in separate layers
D. No, it processes the entire sequence simultaneously (bidirectionally)

32 What is the 'Safety' challenge in open-domain chatbots?

A. Ensuring the model saves data securely
B. Preventing users from hacking the server
C. Preventing the model from being deleted
D. Preventing the generation of toxic, biased, or offensive content

33 How does the Reformer save memory regarding the 'Q, K, V' matrices in Attention?

A. It compresses them using JPEG
B. It eliminates the Value matrix
C. It uses shared Query (Q) and Key (K) spaces
D. It stores them on the hard drive

34 When using BERT for QA, what does the model output for every token in the passage?

A. A sentiment score
B. A probability score for being the 'Start' and 'End' of the answer
C. A probability of being the next word
D. A translation of the token

35 Which model would be most appropriate for summarising a very long book into a short paragraph?

A. Reformer (or Longformer)
B. A simple RNN
C. Standard BERT (limit 512 tokens)
D. Naive Bayes

36 In the context of T5, what does 'Transfer Learning' specifically refer to?

A. Transferring data from training set to test set
B. Transferring files between computers
C. Transferring the weights from a pre-trained general model to a specific downstream task
D. Transferring the style of one author to another

37 What is a 'Persona-based' Chatbot?

A. A bot conditioned on a specific profile (e.g., 'I am a doctor') to improve consistency
B. A bot that changes personality every turn
C. A bot that asks for personal information
D. A bot used only for personnel management

38 Why is 'Masked Language Modeling' (MLM) harder than standard left-to-right language modeling?

A. It uses images
B. It requires more labeled data
C. It isn't; it is easier
D. The model must deduce the missing word based on bidirectional context rather than just previous words

39 What is the typical size of the vocabulary in models like BERT or T5?

A. 26 letters
B. Around 30,000 to 50,000 tokens (WordPieces/SentencePieces)
C. 1 Million words
D. 100 words

40 In a Chatbot, 'Multi-turn' capability means:

A. The bot can speak multiple languages
B. The bot can answer multiple questions at once
C. The bot spins around
D. The bot can handle a conversation with back-and-forth exchanges while maintaining context

41 What is the primary trade-off when using a Reformer model with Reversible Layers?

A. Higher memory usage for faster speed
B. It can only process numbers
C. Lower accuracy for higher memory
D. Slightly higher computational cost (re-computing) for significantly lower memory usage

42 T5 typically uses which type of positional encoding?

A. No positional encoding
B. GPS coordinates
C. Relative position embeddings
D. Absolute sinusoidal embeddings

43 Which of these is NOT a standard challenge for Transformers?

A. Computational cost (GPU requirements)
B. Data hunger (need large datasets)
C. Interpretability (Black box nature)
D. Inability to handle sequential data

44 For a Chatbot, 'Slot Filling' refers to:

A. Putting coins in a machine
B. Extracting specific parameters (e.g., date, time, location) from user input
C. Filling the memory with data
D. The time slot when the bot is active

45 The 'Chunking' strategy in Reformer helps in:

A. Processing long sequences by breaking them into fixed-size chunks to apply LSH attention
B. Breaking words into letters
C. Deleting parts of the sentence
D. Grouping users into chunks

46 Which model introduced the 'Masked Language Model' concept?

A. Reformer
B. T5
C. BERT
D. LSTM

47 When fine-tuning T5 for QA, the target output is:

A. A '1' or '0' (Binary classification)
B. A vector representation
C. The raw text string of the answer
D. The index of the start and end words

48 Why might a Reformer be less suitable than BERT for short-sequence tasks?

A. Reformer is only for images
B. Reformer cannot handle text
C. Reformer has low accuracy
D. The overhead of LSH and reversible layers adds complexity not needed for short sequences

49 What is 'Zero-Shot' learning in the context of QA models?

A. The model taking zero seconds to reply
B. The model answering questions without any specific fine-tuning on that QA dataset
C. The model failing 0 times
D. The model training with zero data

50 In Chatbot development, what is the role of the 'Temperature' parameter during generation?

A. It determines the length of the sentence
B. It controls the heat of the GPU
C. It sets the mood of the bot
D. It controls the randomness of predictions (Low = deterministic, High = creative/random)