Unit 3 - Practice Quiz

INT344 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the primary goal of calculating Minimum Edit Distance?

A. To find the longest common subsequence between two strings
B. To calculate the probability of a word appearing in a sentence
C. To generate word embeddings for a neural network
D. To quantify the dissimilarity between two strings by counting operations

2 Which of the following operations is NOT typically used in the Levenshtein distance algorithm?

A. Substitution
B. Insertion
C. Deletion
D. Transposition

3 In the context of dynamic programming for edit distance, if source[i] equals target[j], what is the cost of substitution?

A. Infinity
B. 1
C. 2
D. 0

4 What is the Minimum Edit Distance between the strings 'cat' and 'cut'?

A. 3
B. 0
C. 1
D. 2

5 How does an autocorrect system typically identify candidate words for a misspelled word?

A. By selecting random words from the dictionary
B. By using the longest word in the dictionary
C. By choosing words that start with the same letter only
D. By finding words within a certain edit distance threshold

6 Which algorithm is commonly used to efficiently calculate the Minimum Edit Distance?

A. K-Means Clustering
B. Depth-First Search
C. Gradient Descent
D. Dynamic Programming

7 In Part of Speech (POS) tagging, what is the 'Hidden' component in a Hidden Markov Model?

A. The transition probabilities
B. The punctuation marks
C. The Part of Speech tags
D. The words in the sentence

8 What does the Markov Assumption state in the context of Markov Chains?

A. The future state depends on all past states
B. The future state depends on the hidden emissions
C. The future state depends only on the current state
D. The future state is independent of the current state

9 What are 'Transition Probabilities' in an HMM?

A. The probability of a word being misspelled
B. The probability of a sentence starting with a specific word
C. The probability of moving from one POS tag to another
D. The probability of generating a specific word given a tag

10 What are 'Emission Probabilities' in an HMM used for POS tagging?

A. P(tag|previous_tag)
B. P(tag|word)
C. P(word|previous_word)
D. P(word|tag)

11 Which algorithm is used to find the most likely sequence of hidden states (POS tags) given a sequence of observations?

A. The Edit Distance Algorithm
B. The Viterbi Algorithm
C. The Backward Algorithm
D. The Forward Algorithm

12 A text corpus is:

A. A large, structured set of texts used for statistical analysis
B. A set of rules for grammar
C. A dictionary of word definitions
D. A software used for autocorrection

13 In an N-gram language model, what is 'N'?

A. The number of words in the sentence
B. The number of hidden states
C. The dimension of the word embedding
D. The number of words in the sequence considered for probability

14 Which assumption simplifies the calculation of N-gram probabilities?

A. The probability depends on the entire sentence history
B. The probability of a word depends only on the previous N-1 words
C. All words have equal probability
D. Words are independent of each other

15 How is the probability of a bigram P(w2 | w1) calculated from a corpus?

A. Count(w1) * Count(w2)
B. Count(w1, w2) / Count(w2)
C. Count(w1, w2) / Count(w1)
D. Count(w1) / Count(w2)

16 What is the main problem with N-gram models when N is very large?

A. The vocabulary size decreases
B. The model becomes too simple
C. Data sparsity (many sequences have zero counts)
D. The context window becomes too small

17 What technique is used to handle N-grams that have zero probability in the training data?

A. Tagging
B. Pruning
C. Smoothing (e.g., Laplace Smoothing)
D. Filtering

18 In Laplace (Add-1) smoothing, what is added to the denominator?

A. The number of sentences
B. The total word count
C. 1
D. The vocabulary size (V)

19 What does an autocomplete system try to maximize?

A. The length of the sentence
B. P(previous_words | next_word)
C. The edit distance between words
D. P(next_word | previous_words)

20 A Trigram model looks at how many previous words to predict the next word?

A. 3
B. 2
C. 0
D. 1

21 One-hot encoding of words results in vectors that are:

A. Sparse and high-dimensional
B. Sparse and low-dimensional
C. Dense and high-dimensional
D. Dense and low-dimensional

22 What is a major limitation of one-hot encoding for words?

A. It is difficult to compute
B. It cannot represent rare words
C. It does not capture semantic similarity between words
D. It requires a neural network

23 Word embeddings typically represent words as:

A. Sparse binary vectors
B. Dense vectors of real numbers
C. Strings
D. Integers

24 Which metric is commonly used to measure the similarity between two word embedding vectors?

A. Edit Distance
B. Perplexity
C. Cosine Similarity
D. Jaccard Index

25 The Word2Vec model 'Skip-gram' architecture tries to predict:

A. The next sentence
B. The target word given the context words
C. The POS tag of the word
D. The context words given the target word

26 The Word2Vec model 'CBOW' (Continuous Bag of Words) architecture tries to predict:

A. The context words given the target word
B. The target word given the context words
C. The part of speech
D. The document topic

27 What famous algebraic property is often cited to demonstrate the semantic capability of word embeddings?

A. King - Man + Woman = Queen
B. Fast + Slow = Speed
C. Apple + Orange = Fruit
D. Paris - France = Germany

28 In the 'Noisy Channel Model' for spelling correction, P(x|w) represents:

A. The probability of typing x given the intended word w
B. The probability of x being a valid word
C. The probability of the word w appearing in the corpus
D. The probability that the user meant w but typed x

29 When building an HMM for POS tagging, the sum of probabilities of all outgoing transitions from a single state must equal:

A. The number of states
B. 1
C. The number of observations
D. 0

30 What is 'Perplexity' in the context of Language Models?

A. A measure of how well a probability model predicts a sample
B. The size of the vocabulary
C. The number of parameters in the model
D. The time taken to train the model

31 Why do we use Log Probabilities instead of raw probabilities in N-gram calculations?

A. To make numbers larger
B. To increase perplexity
C. To handle negative numbers
D. To avoid arithmetic underflow

32 Which of the following describes the 'Start' token (<s>) in N-gram models?

A. It represents an unknown word
B. It is used for punctuation
C. It indicates the end of a sentence
D. It gives context for the first word in the sentence

33 What represents the 'Observations' in a POS HMM?

A. The sequence of words in the text
B. The sequence of tags
C. The initial state probabilities
D. The transition matrix

34 In Minimum Edit Distance, the 'backtrace' step is used to:

A. Sum the rows
B. Calculate the cost
C. Determine the actual sequence of operations (alignment)
D. Initialize the matrix

35 Which token is typically used to replace words not found in the training vocabulary?

A. <END>
B. <START>
C. <UNK>
D. <NULL>

36 A 'Unigram' model assumes that:

A. Words depend on the grammar
B. Words depend on the previous word
C. Words are independent of context
D. Words depend on the previous two words

37 The dimensionality of a Word2Vec embedding vector is typically chosen by:

A. The length of the sentence
B. The size of the vocabulary
C. The system designer (hyperparameter)
D. The number of unique characters

38 In the equation P(tag|word) ∝ P(word|tag) * P(tag), what is P(tag)?

A. Posterior probability
B. Prior probability
C. Likelihood
D. Emission probability

39 If we want to build a spell checker, which probability do we want to maximize according to Bayes' theorem?

A. P(correction)
B. P(typo | correction)
C. P(correction | typo)
D. P(typo)

40 Which type of language model suffers most from the 'curse of dimensionality'?

A. Bag of Words
B. Word2Vec
C. High-order N-gram model (e.g., 5-gram)
D. Unigram model

41 What is the primary input to a neural network training a Word2Vec model?

A. Image pixels
B. One-hot encoded vectors of words
C. Parse trees
D. Audio signals

42 The term 'corpus' in NLP refers to:

A. A specific neural network layer
B. A computer algorithm
C. A type of spelling error
D. A body of text data

43 In edit distance, if we assign a higher cost to substitution than insertion/deletion, it implies:

A. Typing a wrong letter is considered worse than missing a letter
B. The distance will always be zero
C. Insertion is impossible
D. The algorithm will fail

44 What is the result of using a sliding window in N-gram generation?

A. It converts text to uppercase
B. It removes stop words
C. It creates a sequence of overlapping word chunks
D. It calculates the edit distance

45 Which of these words likely has a vector closest to 'frog' in a well-trained embedding space?

A. Philosophy
B. Galaxy
C. Toad
D. Steel

46 In an HMM, what connects hidden states to each other?

A. The Viterbi path
B. Emission probabilities
C. Observation vectors
D. Transition probabilities

47 What is 'Stupid Backoff' in the context of Language Models?

A. A way to delete wrong words
B. A method to stop the algorithm
C. A type of neural network
D. A smoothing method that uses lower-order N-grams if higher-order ones are missing

48 Which application primarily utilizes Probabilistic Language Models?

A. Speech Recognition
B. Image Compression
C. Network Routing
D. Database Management

49 In the context of Word Embeddings, what does 'Polysemy' refer to?

A. Words with similar spellings
B. Words in different languages
C. Words with multiple meanings
D. Words that rhyme

50 If P(A|B) is the probability of tag A following tag B, this is an example of:

A. Edit probability
B. Emission probability
C. Transition probability
D. Observation probability