Unit 5 - Practice Quiz

INT344 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the primary characteristic of a Sequence Model compared to a standard Feedforward Neural Network?

A. It takes the order of inputs into account and can handle variable-length inputs
B. It uses only Convolutional layers
C. It assumes all inputs are independent of each other
D. It processes inputs of fixed length only

2 Which of the following data types is best suited for a Sequence Model?

A. Tabular housing price data
B. Static image classification
C. Sentiment analysis of movie reviews
D. Iris flower categorization

3 In a Recurrent Neural Network (RNN), what is the function of the 'hidden state'?

A. To reset the network weights after every epoch
B. To act as a memory that captures information about previous time steps
C. To visualize the attention weights
D. To store the final output class

4 What is the phenomenon called when the gradients become extremely small during the backpropagation through time in an RNN, preventing weights from updating?

A. Vanishing Gradient
B. Gradient Clipping
C. Exploding Gradient
D. Overfitting

5 Which algorithm is typically used to train Recurrent Neural Networks?

A. Random Forest
B. Backpropagation Through Time (BPTT)
C. Standard Backpropagation
D. K-Means Clustering

6 Which activation function is most commonly used for the hidden state in a simple RNN to help regulate values?

A. Softmax
B. ReLU
C. Linear
D. Tanh

7 What is the primary architectural solution designed to solve the Vanishing Gradient problem in standard RNNs?

A. Perceptron
B. Autoencoder
C. Convolutional Neural Network (CNN)
D. Long Short-Term Memory (LSTM)

8 In an LSTM unit, which gate is responsible for deciding what information to discard from the cell state?

A. Output Gate
B. Update Gate
C. Input Gate
D. Forget Gate

9 What represents the 'long-term memory' component in an LSTM architecture?

A. Cell State (C_t)
B. Input Gate
C. Output Gate
D. Hidden State (h_t)

10 In an LSTM, what is the range of values output by the sigmoid activation function used in gates?

A. -infinity to +infinity
B. -1 to 1
C. 0 to 1
D. 0 to 100

11 Which task involves assigning a grammatical category (like Noun, Verb, Adjective) to every word in a sentence?

A. Part-of-Speech (POS) Tagging
B. Named Entity Recognition
C. Sentiment Analysis
D. Machine Translation

12 Named Entity Recognition (NER) is primarily concerned with identifying:

A. The translation of the text
B. Sentiment of the text
C. Real-world objects like people, organizations, and locations
D. Grammatical errors

13 What type of Sequence problem is POS Tagging?

A. Many-to-One
B. One-to-Many
C. Many-to-Many (Synced)
D. One-to-One

14 In the context of NER, what does the 'BIO' or 'IOB' tagging scheme stand for?

A. Binary-Input-Output
B. Beginning-Inside-Outside
C. Basic-Input-Operation
D. Backward-Inward-Onward

15 What is the core architecture used in Neural Machine Translation (NMT) before the introduction of Attention?

A. Decoder-Only
B. Encoder-Only
C. Encoder-Decoder (Seq2Seq)
D. Random Forest

16 In a traditional Seq2Seq model, what is the role of the Encoder?

A. To generate the output sequence
B. To visualize the data
C. To calculate the loss function
D. To compress the input sequence into a fixed-length context vector

17 What is the 'Context Vector' in a traditional RNN-based Encoder-Decoder model?

A. The first hidden state of the encoder
B. The average of all input vectors
C. The last hidden state of the encoder
D. The weights of the output layer

18 Which of the following is a major bottleneck of the traditional Seq2Seq model?

A. It can only translate into English
B. It requires too much RAM
C. Performance degrades significantly for long sentences due to the fixed-length context vector
D. It cannot handle numeric data

19 What is 'Teacher Forcing' in the context of training sequence models?

A. Using the model's predicted output as input for the next step during training
B. Forcing the model to stop training early
C. Using the actual ground truth output from the previous time step as input for the current step during training
D. Manually setting the weights of the network

20 Which search strategy explores multiple possible output sequences simultaneously to find the most likely translation?

A. Linear Search
B. Beam Search
C. Greedy Search
D. Binary Search

21 The Attention Mechanism was primarily introduced to solve which problem?

A. The inability of RNNs to process images
B. Slow training of Linear Regression
C. The information bottleneck of the fixed-length context vector in NMT
D. Overfitting in CNNs

22 How does the Attention Mechanism calculate the context vector for each time step in the decoder?

A. By computing a weighted sum of all encoder hidden states
B. By taking the last state of the encoder only
C. By averaging all input words
D. By randomly selecting an encoder state

23 In Attention, what do the 'alignment scores' (or attention weights) represent?

A. The number of hidden layers
B. The magnitude of the gradient
C. The error rate of the model
D. How relevant a specific input word is to the word currently being generated

24 What mathematical function is typically applied to alignment scores to convert them into probabilities that sum to 1?

A. ReLU
B. Softmax
C. Sigmoid
D. Tanh

25 In the context of RNNs, what is 'Backpropagation Through Time' (BPTT)?

A. A method to predict future stock prices
B. Training the network in reverse order
C. Unfolding the RNN across time steps and applying backpropagation
D. Using future data to predict past data

26 Which of the following is NOT a gate in a standard LSTM?

A. Input Gate
B. Output Gate
C. Attention Gate
D. Forget Gate

27 What is the shape of the input data for a basic RNN layer in Keras/TensorFlow?

A. (Timesteps, Batch Size)
B. (Batch Size, Features)
C. (Batch Size, Timesteps, Features)
D. (Features, Labels)

28 Why are Bidirectional RNNs (BiRNNs) useful?

A. They train faster than standard RNNs
B. They eliminate the need for backpropagation
C. They use fewer parameters
D. They allow the network to have context from both the past and the future

29 In a Many-to-One sequence model (e.g., Sentiment Analysis), where is the output typically taken?

A. At every time step
B. At the first time step
C. Randomly sampled
D. At the last time step

30 What does the 'candidate cell state' in an LSTM do?

A. It outputs the final prediction
B. It proposes new values that could be added to the state
C. It clears the memory
D. It decides what to forget

31 Which issue leads to the 'Exploding Gradient' problem?

A. Gradients > 1 accumulating multiplicatively
B. Gradients < 1 accumulating multiplicatively
C. Weights initialized to zero
D. Learning rate being too low

32 A solution to the Exploding Gradient problem is:

A. Using ReLU
B. Gradient Clipping
C. Removing the hidden layer
D. Increasing the learning rate

33 In an Attention model, the vector c_t is often referred to as:

A. The Bias Vector
B. The Context Vector
C. The Noise Vector
D. The Forget Vector

34 Sequence-to-Sequence models are most commonly associated with:

A. Cluster Analysis
B. Linear Regression
C. Image Segmentation
D. Text Summarization

35 What is 'Global Attention'?

A. Attention applied to a single word
B. Attention that considers only a window of hidden states
C. Attention that considers all hidden states of the encoder
D. Attention applied without weights

36 Which of the following describes 'Greedy Decoding'?

A. Considering all possible future sequences
B. Waiting until the end to choose words
C. Choosing the word with the highest probability at each step immediately
D. Choosing a random word based on distribution

37 In POS tagging, if a word is ambiguous (e.g., 'book' can be a noun or verb), how does an RNN resolve it?

A. It flips a coin
B. It always picks the most common usage
C. It cannot resolve ambiguity
D. It uses the context provided by surrounding words stored in the hidden state

38 What is the typical loss function for a multi-class classification problem like POS Tagging or NMT?

A. Absolute Error
B. Hinge Loss
C. Categorical Cross-Entropy
D. Mean Squared Error (MSE)

39 What does GRU stand for?

A. Global Recurrent Update
B. Gated Recurrent Unit
C. General Regression Unit
D. Gradient Rectified Unit

40 The 'Input Gate' in an LSTM is usually controlled by which activation function?

A. Sigmoid
B. Tanh
C. ReLU
D. Linear

41 What visual tool is often used to interpret what an Attention model has learned?

A. Histogram
B. Scatter Plot
C. Pie Chart
D. Attention Heatmap

42 Which limitation of RNNs prevents parallelization during training?

A. Large memory footprint
B. Sequential dependency of the hidden state
C. Use of sigmoid functions
D. Complex loss functions

43 In a sequence model, 'padding' is used to:

A. Add noise to the data
B. Remove stopwords
C. Increase the learning rate
D. Make all sequences in a batch the same length

44 Which of these is a 'many-to-one' application of sequence models?

A. Machine Translation
B. Video Captioning
C. Sentiment Classification
D. Music Generation

45 In an NER task, identifying 'Apple' as an Organization rather than a Fruit relies on:

A. The contextual information in the sequence
B. The capitalization
C. The length of the word
D. The spelling of the word

46 Why is the traditional Encoder-Decoder model often described as having 'amnesia'?

A. It uses a forget gate
B. It cannot learn new words
C. It struggles to retain information from the beginning of a long sequence at the decoding stage
D. It forgets the weights after training

47 In the attention equation score(h_t, h_s), what are h_t and h_s?

A. Input and Output Gates
B. Weight and Bias
C. Learning rate and Loss
D. Decoder hidden state and Encoder hidden state

48 Which mechanism allows a model to focus on 'local' parts of the input sequence based on the current decoding step?

A. Max Pooling
B. Dropout
C. Batch Normalization
D. Attention Mechanism

49 In sequence labeling, what does the output layer usually consist of?

A. A Softmax layer over the tag set for each time step
B. A linear regression layer
C. A single neuron
D. A clustering algorithm

50 Which of the following best describes the 'Seq2Seq' mapping?

A. Variable Input Size -> Variable Output Size
B. Fixed Input Size -> Fixed Output Size
C. Variable Input Size -> Fixed Output Size
D. Fixed Input Size -> Variable Output Size