Unit 6 - Practice Quiz

INT344 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the primary concept behind Transfer Learning in the context of Question Answering (QA)?

A. Using a model pre-trained on a large corpus and fine-tuning it on a QA dataset
B. Translating questions from one language to another before answering
C. Training a model from scratch on a small QA dataset
D. Using hard-coded rules to extract answers from text

2 Which of the following is considered a State-Of-The-Art (SOTA) approach for Natural Language Processing tasks like Question Answering?

A. Hidden Markov Models
B. Support Vector Machines
C. Naive Bayes Classifiers
D. Transformer-based models

3 In the context of BERT, what does the acronym stand for?

A. Bidirectional Encoder Representations from Transformers
B. Basic Entity Recognition Technique
C. Bigram Encoding for Robust Text
D. Binary Encoded Recurrent Transformers

4 How does BERT typically approach the Question Answering task (specifically Extractive QA)?

A. It predicts the start and end token indices of the answer span within the context passage
B. It generates new text to answer the question
C. It translates the question into a database query
D. It retrieves a whole document that might contain the answer

5 What pre-training objective allows BERT to understand the relationship between two sentences, which is useful for QA contexts?

A. Masked Language Modeling (MLM)
B. Next Sentence Prediction (NSP)
C. Locality Sensitive Hashing
D. Sequence-to-Sequence generation

6 T5 distinguishes itself from BERT by using which unifying framework for all NLP tasks?

A. Token-Classification
B. Regression-Analysis
C. Image-to-Text
D. Text-to-Text

7 In the T5 model, how is a specific task (like Question Answering) triggered?

A. By manually adjusting the learning rate
B. By changing the model architecture
C. By using a specific task prefix in the input text
D. By using a separate encoder for each task

8 What architecture does T5 utilize?

A. Encoder-Decoder (standard Transformer)
B. Decoder-only (like GPT)
C. Recurrent Neural Network
D. Encoder-only (like BERT)

9 When building a Chatbot, what is the 'Context Window' challenge?

A. The model cannot process images
B. The model generates text too slowly
C. The model has a limit on the amount of previous conversation history it can remember
D. The model cannot understand multiple languages

10 What is 'Hallucination' in the context of Chatbots and QA models?

A. The model copying the user's input exactly
B. The model generating factually incorrect information confidently
C. The model crashing due to memory overflow
D. The model failing to produce any output

11 Which of the following is a major computational challenge faced by standard Transformer models like BERT?

A. Requirement of labeled data only
B. Quadratic memory and time complexity relative to sequence length
C. Linear dependence on sequence length
D. Inability to handle numerical data

12 The Reformer model is designed to address which specific limitation of the Transformer?

A. The need for tokenization
B. Efficiency and memory usage on long sequences
C. The inability to do translation
D. Low accuracy on short sentences

13 What technique does the Reformer use to approximate the attention mechanism efficiently?

A. Global Average Pooling
B. Recurrent connections
C. Locality Sensitive Hashing (LSH)
D. Convolutional layers

14 In a Reformer model, what is the purpose of Reversible Residual Layers?

A. To reverse the text direction for bidirectional context
B. To increase the number of parameters without increasing size
C. To store activations only once, allowing recalculation during backpropagation to save memory
D. To allow the model to run on CPUs only

15 Which special token is used in BERT to separate the question from the context passage?

A. [SEP]
B. [PAD]
C. [CLS]
D. [MASK]

16 In T5's pre-training strategy, what is 'span corruption'?

A. Deleting random words and asking the model to predict the sentiment
B. Shuffling the order of sentences in a paragraph
C. Corrupting the embeddings with noise
D. Replacing spans of text with a unique sentinel token and training the model to reconstruct the missing span

17 A Chatbot built using a Retrieval-Based model differs from a Generative model because:

A. It requires no training data
B. It uses voice recognition
C. It selects the best response from a predefined database of responses
D. It creates new sentences word-by-word

18 What is the 'Consistency' challenge in Chatbots?

A. The bot consistently answering correctly
B. The bot contradicting its own previous statements or persona
C. The bot replying with the same answer to every question
D. The bot using consistent grammar

19 Why is the SQuAD (Stanford Question Answering Dataset) important for QA models?

A. It is used to translate questions
B. It provides the code for the models
C. It is a database of all possible questions in English
D. It is a standardized benchmark dataset for evaluating reading comprehension and QA systems

20 In BERT, what represents the aggregate representation of the entire sequence, often used for classification?

A. The embedding of the last token
B. The output embedding of the [SEP] token
C. The output embedding of the [CLS] token
D. The average of all token embeddings

21 Which of the following is a solution to the 'Blandness' problem (generic responses like 'I don't know') in generative chatbots?

A. Removing the attention mechanism
B. Adjusting the temperature or decoding strategy (e.g., Nucleus Sampling)
C. Training on less data
D. Decreasing the model size

22 How does the Reformer model handle the issue of large embedding tables for vocabulary?

A. It removes the vocabulary
B. It uses character-level embeddings only
C. This is not a specific focus of the Reformer (it focuses on attention and depth memory)
D. It uses very small vocabulary sizes

23 What is 'Fine-Tuning' in the context of building a QA model with BERT?

A. Adjusting the hyperparameters of the model manually
B. Updating the weights of a pre-trained BERT model using a specific QA dataset
C. Designing the neural network architecture from scratch
D. Cleaning the text data before input

24 In T5, what dataset was primarily used for pre-training?

A. IMDB Reviews
B. C4 (Colossal Clean Crawled Corpus)
C. ImageNet
D. SQuAD only

25 What is a major advantage of the T5 'Text-to-Text' framework over BERT's approach?

A. It is faster to train
B. It uses fewer parameters
C. It requires no GPU
D. It allows the same model and loss function to be used for generation, translation, and classification

26 When training a chatbot, what is 'Teacher Forcing'?

A. Forcing the model to stop training early
B. Using the ground-truth previous token as input during training instead of the model's own generated output
C. A human teacher corrects the bot's answers
D. Using a larger model to teach a smaller model

27 Which challenge for Transformer models relates to the maximum number of tokens it can process at once?

A. Sequence Length Limit
B. Overfitting
C. Bias variance tradeoff
D. Vanishing Gradient

28 In the Reformer, what happens if two vectors fall into the same hash bucket during LSH?

A. They are merged into one vector
B. They are considered for attention calculation with each other
C. They are moved to a different layer
D. They are deleted

29 What is 'Abstractive' Question Answering?

A. Generating an answer that may contain words not present in the context passage
B. Highlighting the answer in the text
C. Answering yes/no questions only
D. Ignoring the context passage

30 Which evaluation metric is commonly used for measuring the overlap between a chatbot's generated response and a reference response?

A. F1-Score (for classification)
B. Mean Squared Error
C. BLEU or ROUGE
D. Accuracy

31 Does BERT process text sequentially from left-to-right?

A. No, it processes the entire sequence simultaneously (bidirectionally)
B. Yes, but also right-to-left in separate layers
C. Yes, strictly left-to-right
D. No, it processes random words first

32 What is the 'Safety' challenge in open-domain chatbots?

A. Preventing users from hacking the server
B. Ensuring the model saves data securely
C. Preventing the model from being deleted
D. Preventing the generation of toxic, biased, or offensive content

33 How does the Reformer save memory regarding the 'Q, K, V' matrices in Attention?

A. It compresses them using JPEG
B. It eliminates the Value matrix
C. It uses shared Query (Q) and Key (K) spaces
D. It stores them on the hard drive

34 When using BERT for QA, what does the model output for every token in the passage?

A. A sentiment score
B. A translation of the token
C. A probability score for being the 'Start' and 'End' of the answer
D. A probability of being the next word

35 Which model would be most appropriate for summarising a very long book into a short paragraph?

A. Standard BERT (limit 512 tokens)
B. Naive Bayes
C. Reformer (or Longformer)
D. A simple RNN

36 In the context of T5, what does 'Transfer Learning' specifically refer to?

A. Transferring files between computers
B. Transferring the weights from a pre-trained general model to a specific downstream task
C. Transferring data from training set to test set
D. Transferring the style of one author to another

37 What is a 'Persona-based' Chatbot?

A. A bot used only for personnel management
B. A bot conditioned on a specific profile (e.g., 'I am a doctor') to improve consistency
C. A bot that changes personality every turn
D. A bot that asks for personal information

38 Why is 'Masked Language Modeling' (MLM) harder than standard left-to-right language modeling?

A. It uses images
B. It isn't; it is easier
C. The model must deduce the missing word based on bidirectional context rather than just previous words
D. It requires more labeled data

39 What is the typical size of the vocabulary in models like BERT or T5?

A. 26 letters
B. 100 words
C. Around 30,000 to 50,000 tokens (WordPieces/SentencePieces)
D. 1 Million words

40 In a Chatbot, 'Multi-turn' capability means:

A. The bot can speak multiple languages
B. The bot spins around
C. The bot can answer multiple questions at once
D. The bot can handle a conversation with back-and-forth exchanges while maintaining context

41 What is the primary trade-off when using a Reformer model with Reversible Layers?

A. Slightly higher computational cost (re-computing) for significantly lower memory usage
B. Higher memory usage for faster speed
C. Lower accuracy for higher memory
D. It can only process numbers

42 T5 typically uses which type of positional encoding?

A. Absolute sinusoidal embeddings
B. No positional encoding
C. GPS coordinates
D. Relative position embeddings

43 Which of these is NOT a standard challenge for Transformers?

A. Data hunger (need large datasets)
B. Inability to handle sequential data
C. Computational cost (GPU requirements)
D. Interpretability (Black box nature)

44 For a Chatbot, 'Slot Filling' refers to:

A. Filling the memory with data
B. The time slot when the bot is active
C. Putting coins in a machine
D. Extracting specific parameters (e.g., date, time, location) from user input

45 The 'Chunking' strategy in Reformer helps in:

A. Processing long sequences by breaking them into fixed-size chunks to apply LSH attention
B. Breaking words into letters
C. Deleting parts of the sentence
D. Grouping users into chunks

46 Which model introduced the 'Masked Language Model' concept?

A. LSTM
B. Reformer
C. T5
D. BERT

47 When fine-tuning T5 for QA, the target output is:

A. The raw text string of the answer
B. A vector representation
C. A '1' or '0' (Binary classification)
D. The index of the start and end words

48 Why might a Reformer be less suitable than BERT for short-sequence tasks?

A. Reformer is only for images
B. Reformer has low accuracy
C. The overhead of LSH and reversible layers adds complexity not needed for short sequences
D. Reformer cannot handle text

49 What is 'Zero-Shot' learning in the context of QA models?

A. The model training with zero data
B. The model failing 0 times
C. The model taking zero seconds to reply
D. The model answering questions without any specific fine-tuning on that QA dataset

50 In Chatbot development, what is the role of the 'Temperature' parameter during generation?

A. It sets the mood of the bot
B. It controls the heat of the GPU
C. It controls the randomness of predictions (Low = deterministic, High = creative/random)
D. It determines the length of the sentence