Unit 5 - Practice Quiz

INT344 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the primary characteristic of a Sequence Model compared to a standard Feedforward Neural Network?

A. It takes the order of inputs into account and can handle variable-length inputs
B. It processes inputs of fixed length only
C. It assumes all inputs are independent of each other
D. It uses only Convolutional layers

2 Which of the following data types is best suited for a Sequence Model?

A. Static image classification
B. Iris flower categorization
C. Sentiment analysis of movie reviews
D. Tabular housing price data

3 In a Recurrent Neural Network (RNN), what is the function of the 'hidden state'?

A. To act as a memory that captures information about previous time steps
B. To visualize the attention weights
C. To reset the network weights after every epoch
D. To store the final output class

4 What is the phenomenon called when the gradients become extremely small during the backpropagation through time in an RNN, preventing weights from updating?

A. Exploding Gradient
B. Vanishing Gradient
C. Overfitting
D. Gradient Clipping

5 Which algorithm is typically used to train Recurrent Neural Networks?

A. K-Means Clustering
B. Backpropagation Through Time (BPTT)
C. Random Forest
D. Standard Backpropagation

6 Which activation function is most commonly used for the hidden state in a simple RNN to help regulate values?

A. Softmax
B. ReLU
C. Linear
D. Tanh

7 What is the primary architectural solution designed to solve the Vanishing Gradient problem in standard RNNs?

A. Perceptron
B. Long Short-Term Memory (LSTM)
C. Autoencoder
D. Convolutional Neural Network (CNN)

8 In an LSTM unit, which gate is responsible for deciding what information to discard from the cell state?

A. Output Gate
B. Input Gate
C. Update Gate
D. Forget Gate

9 What represents the 'long-term memory' component in an LSTM architecture?

A. Cell State (C_t)
B. Input Gate
C. Hidden State (h_t)
D. Output Gate

10 In an LSTM, what is the range of values output by the sigmoid activation function used in gates?

A. 0 to 100
B. -1 to 1
C. 0 to 1
D. -infinity to +infinity

11 Which task involves assigning a grammatical category (like Noun, Verb, Adjective) to every word in a sentence?

A. Sentiment Analysis
B. Part-of-Speech (POS) Tagging
C. Named Entity Recognition
D. Machine Translation

12 Named Entity Recognition (NER) is primarily concerned with identifying:

A. Real-world objects like people, organizations, and locations
B. Sentiment of the text
C. The translation of the text
D. Grammatical errors

13 What type of Sequence problem is POS Tagging?

A. Many-to-One
B. Many-to-Many (Synced)
C. One-to-One
D. One-to-Many

14 In the context of NER, what does the 'BIO' or 'IOB' tagging scheme stand for?

A. Backward-Inward-Onward
B. Beginning-Inside-Outside
C. Basic-Input-Operation
D. Binary-Input-Output

15 What is the core architecture used in Neural Machine Translation (NMT) before the introduction of Attention?

A. Random Forest
B. Encoder-Decoder (Seq2Seq)
C. Encoder-Only
D. Decoder-Only

16 In a traditional Seq2Seq model, what is the role of the Encoder?

A. To visualize the data
B. To calculate the loss function
C. To compress the input sequence into a fixed-length context vector
D. To generate the output sequence

17 What is the 'Context Vector' in a traditional RNN-based Encoder-Decoder model?

A. The first hidden state of the encoder
B. The weights of the output layer
C. The average of all input vectors
D. The last hidden state of the encoder

18 Which of the following is a major bottleneck of the traditional Seq2Seq model?

A. Performance degrades significantly for long sentences due to the fixed-length context vector
B. It can only translate into English
C. It cannot handle numeric data
D. It requires too much RAM

19 What is 'Teacher Forcing' in the context of training sequence models?

A. Using the actual ground truth output from the previous time step as input for the current step during training
B. Using the model's predicted output as input for the next step during training
C. Forcing the model to stop training early
D. Manually setting the weights of the network

20 Which search strategy explores multiple possible output sequences simultaneously to find the most likely translation?

A. Greedy Search
B. Beam Search
C. Linear Search
D. Binary Search

21 The Attention Mechanism was primarily introduced to solve which problem?

A. The information bottleneck of the fixed-length context vector in NMT
B. Overfitting in CNNs
C. Slow training of Linear Regression
D. The inability of RNNs to process images

22 How does the Attention Mechanism calculate the context vector for each time step in the decoder?

A. By averaging all input words
B. By randomly selecting an encoder state
C. By taking the last state of the encoder only
D. By computing a weighted sum of all encoder hidden states

23 In Attention, what do the 'alignment scores' (or attention weights) represent?

A. The magnitude of the gradient
B. The error rate of the model
C. The number of hidden layers
D. How relevant a specific input word is to the word currently being generated

24 What mathematical function is typically applied to alignment scores to convert them into probabilities that sum to 1?

A. ReLU
B. Sigmoid
C. Tanh
D. Softmax

25 In the context of RNNs, what is 'Backpropagation Through Time' (BPTT)?

A. Unfolding the RNN across time steps and applying backpropagation
B. Training the network in reverse order
C. A method to predict future stock prices
D. Using future data to predict past data

26 Which of the following is NOT a gate in a standard LSTM?

A. Input Gate
B. Output Gate
C. Forget Gate
D. Attention Gate

27 What is the shape of the input data for a basic RNN layer in Keras/TensorFlow?

A. (Batch Size, Timesteps, Features)
B. (Batch Size, Features)
C. (Timesteps, Batch Size)
D. (Features, Labels)

28 Why are Bidirectional RNNs (BiRNNs) useful?

A. They use fewer parameters
B. They allow the network to have context from both the past and the future
C. They eliminate the need for backpropagation
D. They train faster than standard RNNs

29 In a Many-to-One sequence model (e.g., Sentiment Analysis), where is the output typically taken?

A. At the first time step
B. Randomly sampled
C. At the last time step
D. At every time step

30 What does the 'candidate cell state' in an LSTM do?

A. It decides what to forget
B. It clears the memory
C. It outputs the final prediction
D. It proposes new values that could be added to the state

31 Which issue leads to the 'Exploding Gradient' problem?

A. Weights initialized to zero
B. Learning rate being too low
C. Gradients > 1 accumulating multiplicatively
D. Gradients < 1 accumulating multiplicatively

32 A solution to the Exploding Gradient problem is:

A. Removing the hidden layer
B. Using ReLU
C. Gradient Clipping
D. Increasing the learning rate

33 In an Attention model, the vector c_t is often referred to as:

A. The Context Vector
B. The Forget Vector
C. The Noise Vector
D. The Bias Vector

34 Sequence-to-Sequence models are most commonly associated with:

A. Image Segmentation
B. Cluster Analysis
C. Linear Regression
D. Text Summarization

35 What is 'Global Attention'?

A. Attention applied without weights
B. Attention that considers only a window of hidden states
C. Attention applied to a single word
D. Attention that considers all hidden states of the encoder

36 Which of the following describes 'Greedy Decoding'?

A. Choosing the word with the highest probability at each step immediately
B. Waiting until the end to choose words
C. Considering all possible future sequences
D. Choosing a random word based on distribution

37 In POS tagging, if a word is ambiguous (e.g., 'book' can be a noun or verb), how does an RNN resolve it?

A. It always picks the most common usage
B. It cannot resolve ambiguity
C. It uses the context provided by surrounding words stored in the hidden state
D. It flips a coin

38 What is the typical loss function for a multi-class classification problem like POS Tagging or NMT?

A. Mean Squared Error (MSE)
B. Hinge Loss
C. Categorical Cross-Entropy
D. Absolute Error

39 What does GRU stand for?

A. Gradient Rectified Unit
B. Gated Recurrent Unit
C. Global Recurrent Update
D. General Regression Unit

40 The 'Input Gate' in an LSTM is usually controlled by which activation function?

A. Tanh
B. ReLU
C. Sigmoid
D. Linear

41 What visual tool is often used to interpret what an Attention model has learned?

A. Pie Chart
B. Scatter Plot
C. Attention Heatmap
D. Histogram

42 Which limitation of RNNs prevents parallelization during training?

A. Complex loss functions
B. Sequential dependency of the hidden state
C. Large memory footprint
D. Use of sigmoid functions

43 In a sequence model, 'padding' is used to:

A. Remove stopwords
B. Add noise to the data
C. Make all sequences in a batch the same length
D. Increase the learning rate

44 Which of these is a 'many-to-one' application of sequence models?

A. Sentiment Classification
B. Machine Translation
C. Video Captioning
D. Music Generation

45 In an NER task, identifying 'Apple' as an Organization rather than a Fruit relies on:

A. The length of the word
B. The spelling of the word
C. The contextual information in the sequence
D. The capitalization

46 Why is the traditional Encoder-Decoder model often described as having 'amnesia'?

A. It struggles to retain information from the beginning of a long sequence at the decoding stage
B. It cannot learn new words
C. It uses a forget gate
D. It forgets the weights after training

47 In the attention equation score(h_t, h_s), what are h_t and h_s?

A. Input and Output Gates
B. Learning rate and Loss
C. Decoder hidden state and Encoder hidden state
D. Weight and Bias

48 Which mechanism allows a model to focus on 'local' parts of the input sequence based on the current decoding step?

A. Max Pooling
B. Batch Normalization
C. Attention Mechanism
D. Dropout

49 In sequence labeling, what does the output layer usually consist of?

A. A clustering algorithm
B. A Softmax layer over the tag set for each time step
C. A linear regression layer
D. A single neuron

50 Which of the following best describes the 'Seq2Seq' mapping?

A. Fixed Input Size -> Fixed Output Size
B. Fixed Input Size -> Variable Output Size
C. Variable Input Size -> Fixed Output Size
D. Variable Input Size -> Variable Output Size