Unit 3 - Practice Quiz

INT344

1 What is the primary goal of calculating Minimum Edit Distance?

A. To find the longest common subsequence between two strings
B. To quantify the dissimilarity between two strings by counting operations
C. To calculate the probability of a word appearing in a sentence
D. To generate word embeddings for a neural network

2 Which of the following operations is NOT typically used in the Levenshtein distance algorithm?

A. Insertion
B. Deletion
C. Substitution
D. Transposition

3 In the context of dynamic programming for edit distance, if source[i] equals target[j], what is the cost of substitution?

A.
B. 1
C. 2
D. Infinity

4 What is the Minimum Edit Distance between the strings 'cat' and 'cut'?

A.
B. 1
C. 2
D. 3

5 How does an autocorrect system typically identify candidate words for a misspelled word?

A. By selecting random words from the dictionary
B. By finding words within a certain edit distance threshold
C. By using the longest word in the dictionary
D. By choosing words that start with the same letter only

6 Which algorithm is commonly used to efficiently calculate the Minimum Edit Distance?

A. Depth-First Search
B. Dynamic Programming
C. Gradient Descent
D. K-Means Clustering

7 In Part of Speech (POS) tagging, what is the 'Hidden' component in a Hidden Markov Model?

A. The words in the sentence
B. The Part of Speech tags
C. The punctuation marks
D. The transition probabilities

8 What does the Markov Assumption state in the context of Markov Chains?

A. The future state depends on all past states
B. The future state depends only on the current state
C. The future state is independent of the current state
D. The future state depends on the hidden emissions

9 What are 'Transition Probabilities' in an HMM?

A. The probability of generating a specific word given a tag
B. The probability of moving from one POS tag to another
C. The probability of a sentence starting with a specific word
D. The probability of a word being misspelled

10 What are 'Emission Probabilities' in an HMM used for POS tagging?

A. P(tag|previous_tag)
B. P(word|tag)
C. P(tag|word)
D. P(word|previous_word)

11 Which algorithm is used to find the most likely sequence of hidden states (POS tags) given a sequence of observations?

A. The Viterbi Algorithm
B. The Forward Algorithm
C. The Backward Algorithm
D. The Edit Distance Algorithm

12 A text corpus is:

A. A dictionary of word definitions
B. A large, structured set of texts used for statistical analysis
C. A software used for autocorrection
D. A set of rules for grammar

13 In an N-gram language model, what is 'N'?

A. The number of hidden states
B. The number of words in the sentence
C. The number of words in the sequence considered for probability
D. The dimension of the word embedding

14 Which assumption simplifies the calculation of N-gram probabilities?

A. The probability of a word depends only on the previous N-1 words
B. Words are independent of each other
C. All words have equal probability
D. The probability depends on the entire sentence history

15 How is the probability of a bigram P(w2 | w1) calculated from a corpus?

A. Count(w1, w2) / Count(w2)
B. Count(w1, w2) / Count(w1)
C. Count(w1) / Count(w2)
D. Count(w1) * Count(w2)

16 What is the main problem with N-gram models when N is very large?

A. The model becomes too simple
B. Data sparsity (many sequences have zero counts)
C. The vocabulary size decreases
D. The context window becomes too small

17 What technique is used to handle N-grams that have zero probability in the training data?

A. Pruning
B. Smoothing (e.g., Laplace Smoothing)
C. Filtering
D. Tagging

18 In Laplace (Add-1) smoothing, what is added to the denominator?

A. 1
B. The vocabulary size (V)
C. The number of sentences
D. The total word count

19 What does an autocomplete system try to maximize?

A. P(next_word | previous_words)
B. P(previous_words | next_word)
C. The edit distance between words
D. The length of the sentence

20 A Trigram model looks at how many previous words to predict the next word?

A.
B. 1
C. 2
D. 3

21 One-hot encoding of words results in vectors that are:

A. Dense and low-dimensional
B. Sparse and high-dimensional
C. Dense and high-dimensional
D. Sparse and low-dimensional

22 What is a major limitation of one-hot encoding for words?

A. It is difficult to compute
B. It does not capture semantic similarity between words
C. It cannot represent rare words
D. It requires a neural network

23 Word embeddings typically represent words as:

A. Integers
B. Sparse binary vectors
C. Dense vectors of real numbers
D. Strings

24 Which metric is commonly used to measure the similarity between two word embedding vectors?

A. Edit Distance
B. Cosine Similarity
C. Jaccard Index
D. Perplexity

25 The Word2Vec model 'Skip-gram' architecture tries to predict:

A. The target word given the context words
B. The context words given the target word
C. The next sentence
D. The POS tag of the word

26 The Word2Vec model 'CBOW' (Continuous Bag of Words) architecture tries to predict:

A. The target word given the context words
B. The context words given the target word
C. The document topic
D. The part of speech

27 What famous algebraic property is often cited to demonstrate the semantic capability of word embeddings?

A. King - Man + Woman = Queen
B. Apple + Orange = Fruit
C. Paris - France = Germany
D. Fast + Slow = Speed

28 In the 'Noisy Channel Model' for spelling correction, P(x|w) represents:

A. The probability of the word w appearing in the corpus
B. The probability that the user meant w but typed x
C. The probability of typing x given the intended word w
D. The probability of x being a valid word

29 When building an HMM for POS tagging, the sum of probabilities of all outgoing transitions from a single state must equal:

A.
B. 1
C. The number of states
D. The number of observations

30 What is 'Perplexity' in the context of Language Models?

A. A measure of how well a probability model predicts a sample
B. The time taken to train the model
C. The number of parameters in the model
D. The size of the vocabulary

31 Why do we use Log Probabilities instead of raw probabilities in N-gram calculations?

A. To make numbers larger
B. To avoid arithmetic underflow
C. To increase perplexity
D. To handle negative numbers

32 Which of the following describes the 'Start' token (<s>) in N-gram models?

A. It indicates the end of a sentence
B. It gives context for the first word in the sentence
C. It represents an unknown word
D. It is used for punctuation

33 What represents the 'Observations' in a POS HMM?

A. The sequence of tags
B. The sequence of words in the text
C. The transition matrix
D. The initial state probabilities

34 In Minimum Edit Distance, the 'backtrace' step is used to:

A. Calculate the cost
B. Determine the actual sequence of operations (alignment)
C. Initialize the matrix
D. Sum the rows

35 Which token is typically used to replace words not found in the training vocabulary?

A. <START>
B. <END>
C. <UNK>
D. <NULL>

36 A 'Unigram' model assumes that:

A. Words depend on the previous word
B. Words are independent of context
C. Words depend on the previous two words
D. Words depend on the grammar

37 The dimensionality of a Word2Vec embedding vector is typically chosen by:

A. The size of the vocabulary
B. The length of the sentence
C. The system designer (hyperparameter)
D. The number of unique characters

38 In the equation P(tag|word) ∝ P(word|tag) * P(tag), what is P(tag)?

A. Likelihood
B. Prior probability
C. Posterior probability
D. Emission probability

39 If we want to build a spell checker, which probability do we want to maximize according to Bayes' theorem?

A. P(typo | correction)
B. P(correction | typo)
C. P(typo)
D. P(correction)

40 Which type of language model suffers most from the 'curse of dimensionality'?

A. Unigram model
B. High-order N-gram model (e.g., 5-gram)
C. Word2Vec
D. Bag of Words

41 What is the primary input to a neural network training a Word2Vec model?

A. Audio signals
B. Image pixels
C. One-hot encoded vectors of words
D. Parse trees

42 The term 'corpus' in NLP refers to:

A. A computer algorithm
B. A specific neural network layer
C. A body of text data
D. A type of spelling error

43 In edit distance, if we assign a higher cost to substitution than insertion/deletion, it implies:

A. Typing a wrong letter is considered worse than missing a letter
B. The algorithm will fail
C. The distance will always be zero
D. Insertion is impossible

44 What is the result of using a sliding window in N-gram generation?

A. It removes stop words
B. It creates a sequence of overlapping word chunks
C. It converts text to uppercase
D. It calculates the edit distance

45 Which of these words likely has a vector closest to 'frog' in a well-trained embedding space?

A. Galaxy
B. Toad
C. Steel
D. Philosophy

46 In an HMM, what connects hidden states to each other?

A. Emission probabilities
B. Transition probabilities
C. The Viterbi path
D. Observation vectors

47 What is 'Stupid Backoff' in the context of Language Models?

A. A way to delete wrong words
B. A smoothing method that uses lower-order N-grams if higher-order ones are missing
C. A method to stop the algorithm
D. A type of neural network

48 Which application primarily utilizes Probabilistic Language Models?

A. Image Compression
B. Speech Recognition
C. Database Management
D. Network Routing

49 In the context of Word Embeddings, what does 'Polysemy' refer to?

A. Words with multiple meanings
B. Words with similar spellings
C. Words in different languages
D. Words that rhyme

50 If P(A|B) is the probability of tag A following tag B, this is an example of:

A. Emission probability
B. Transition probability
C. Observation probability
D. Edit probability