Unit 3 - Practice Quiz

INT344 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the primary goal of calculating Minimum Edit Distance?

A. To quantify the dissimilarity between two strings by counting operations
B. To generate word embeddings for a neural network
C. To calculate the probability of a word appearing in a sentence
D. To find the longest common subsequence between two strings

2 Which of the following operations is NOT typically used in the Levenshtein distance algorithm?

A. Substitution
B. Insertion
C. Transposition
D. Deletion

3 In the context of dynamic programming for edit distance, if source[i] equals target[j], what is the cost of substitution?

A. Infinity
B. 1
C. 0
D. 2

4 What is the Minimum Edit Distance between the strings 'cat' and 'cut'?

A. 2
B. 1
C. 3
D. 0

5 How does an autocorrect system typically identify candidate words for a misspelled word?

A. By finding words within a certain edit distance threshold
B. By using the longest word in the dictionary
C. By selecting random words from the dictionary
D. By choosing words that start with the same letter only

6 Which algorithm is commonly used to efficiently calculate the Minimum Edit Distance?

A. Gradient Descent
B. Depth-First Search
C. K-Means Clustering
D. Dynamic Programming

7 In Part of Speech (POS) tagging, what is the 'Hidden' component in a Hidden Markov Model?

A. The words in the sentence
B. The Part of Speech tags
C. The transition probabilities
D. The punctuation marks

8 What does the Markov Assumption state in the context of Markov Chains?

A. The future state depends only on the current state
B. The future state is independent of the current state
C. The future state depends on all past states
D. The future state depends on the hidden emissions

9 What are 'Transition Probabilities' in an HMM?

A. The probability of a sentence starting with a specific word
B. The probability of generating a specific word given a tag
C. The probability of moving from one POS tag to another
D. The probability of a word being misspelled

10 What are 'Emission Probabilities' in an HMM used for POS tagging?

A. P(word|previous_word)
B. P(word|tag)
C. P(tag|previous_tag)
D. P(tag|word)

11 Which algorithm is used to find the most likely sequence of hidden states (POS tags) given a sequence of observations?

A. The Forward Algorithm
B. The Viterbi Algorithm
C. The Edit Distance Algorithm
D. The Backward Algorithm

12 A text corpus is:

A. A set of rules for grammar
B. A software used for autocorrection
C. A large, structured set of texts used for statistical analysis
D. A dictionary of word definitions

13 In an N-gram language model, what is 'N'?

A. The number of words in the sentence
B. The dimension of the word embedding
C. The number of hidden states
D. The number of words in the sequence considered for probability

14 Which assumption simplifies the calculation of N-gram probabilities?

A. The probability depends on the entire sentence history
B. Words are independent of each other
C. All words have equal probability
D. The probability of a word depends only on the previous N-1 words

15 How is the probability of a bigram P(w2 | w1) calculated from a corpus?

A. Count(w1) / Count(w2)
B. Count(w1) * Count(w2)
C. Count(w1, w2) / Count(w1)
D. Count(w1, w2) / Count(w2)

16 What is the main problem with N-gram models when N is very large?

A. The model becomes too simple
B. Data sparsity (many sequences have zero counts)
C. The context window becomes too small
D. The vocabulary size decreases

17 What technique is used to handle N-grams that have zero probability in the training data?

A. Filtering
B. Tagging
C. Smoothing (e.g., Laplace Smoothing)
D. Pruning

18 In Laplace (Add-1) smoothing, what is added to the denominator?

A. 1
B. The total word count
C. The number of sentences
D. The vocabulary size (V)

19 What does an autocomplete system try to maximize?

A. P(next_word | previous_words)
B. P(previous_words | next_word)
C. The edit distance between words
D. The length of the sentence

20 A Trigram model looks at how many previous words to predict the next word?

A. 1
B. 3
C. 2
D. 0

21 One-hot encoding of words results in vectors that are:

A. Sparse and low-dimensional
B. Dense and high-dimensional
C. Sparse and high-dimensional
D. Dense and low-dimensional

22 What is a major limitation of one-hot encoding for words?

A. It cannot represent rare words
B. It requires a neural network
C. It is difficult to compute
D. It does not capture semantic similarity between words

23 Word embeddings typically represent words as:

A. Sparse binary vectors
B. Integers
C. Strings
D. Dense vectors of real numbers

24 Which metric is commonly used to measure the similarity between two word embedding vectors?

A. Jaccard Index
B. Cosine Similarity
C. Perplexity
D. Edit Distance

25 The Word2Vec model 'Skip-gram' architecture tries to predict:

A. The next sentence
B. The target word given the context words
C. The context words given the target word
D. The POS tag of the word

26 The Word2Vec model 'CBOW' (Continuous Bag of Words) architecture tries to predict:

A. The context words given the target word
B. The document topic
C. The target word given the context words
D. The part of speech

27 What famous algebraic property is often cited to demonstrate the semantic capability of word embeddings?

A. Apple + Orange = Fruit
B. Paris - France = Germany
C. Fast + Slow = Speed
D. King - Man + Woman = Queen

28 In the 'Noisy Channel Model' for spelling correction, P(x|w) represents:

A. The probability of the word w appearing in the corpus
B. The probability of x being a valid word
C. The probability of typing x given the intended word w
D. The probability that the user meant w but typed x

29 When building an HMM for POS tagging, the sum of probabilities of all outgoing transitions from a single state must equal:

A. The number of observations
B. The number of states
C. 1
D. 0

30 What is 'Perplexity' in the context of Language Models?

A. The time taken to train the model
B. A measure of how well a probability model predicts a sample
C. The size of the vocabulary
D. The number of parameters in the model

31 Why do we use Log Probabilities instead of raw probabilities in N-gram calculations?

A. To increase perplexity
B. To make numbers larger
C. To avoid arithmetic underflow
D. To handle negative numbers

32 Which of the following describes the 'Start' token (<s>) in N-gram models?

A. It represents an unknown word
B. It is used for punctuation
C. It gives context for the first word in the sentence
D. It indicates the end of a sentence

33 What represents the 'Observations' in a POS HMM?

A. The sequence of tags
B. The transition matrix
C. The sequence of words in the text
D. The initial state probabilities

34 In Minimum Edit Distance, the 'backtrace' step is used to:

A. Initialize the matrix
B. Sum the rows
C. Calculate the cost
D. Determine the actual sequence of operations (alignment)

35 Which token is typically used to replace words not found in the training vocabulary?

A. <UNK>
B. <END>
C. <START>
D. <NULL>

36 A 'Unigram' model assumes that:

A. Words depend on the previous two words
B. Words depend on the grammar
C. Words are independent of context
D. Words depend on the previous word

37 The dimensionality of a Word2Vec embedding vector is typically chosen by:

A. The number of unique characters
B. The length of the sentence
C. The system designer (hyperparameter)
D. The size of the vocabulary

38 In the equation P(tag|word) ∝ P(word|tag) * P(tag), what is P(tag)?

A. Prior probability
B. Posterior probability
C. Emission probability
D. Likelihood

39 If we want to build a spell checker, which probability do we want to maximize according to Bayes' theorem?

A. P(typo)
B. P(correction)
C. P(correction | typo)
D. P(typo | correction)

40 Which type of language model suffers most from the 'curse of dimensionality'?

A. Bag of Words
B. Word2Vec
C. Unigram model
D. High-order N-gram model (e.g., 5-gram)

41 What is the primary input to a neural network training a Word2Vec model?

A. Audio signals
B. Parse trees
C. One-hot encoded vectors of words
D. Image pixels

42 The term 'corpus' in NLP refers to:

A. A specific neural network layer
B. A type of spelling error
C. A body of text data
D. A computer algorithm

43 In edit distance, if we assign a higher cost to substitution than insertion/deletion, it implies:

A. Insertion is impossible
B. The distance will always be zero
C. Typing a wrong letter is considered worse than missing a letter
D. The algorithm will fail

44 What is the result of using a sliding window in N-gram generation?

A. It calculates the edit distance
B. It creates a sequence of overlapping word chunks
C. It removes stop words
D. It converts text to uppercase

45 Which of these words likely has a vector closest to 'frog' in a well-trained embedding space?

A. Philosophy
B. Toad
C. Galaxy
D. Steel

46 In an HMM, what connects hidden states to each other?

A. Transition probabilities
B. The Viterbi path
C. Emission probabilities
D. Observation vectors

47 What is 'Stupid Backoff' in the context of Language Models?

A. A way to delete wrong words
B. A type of neural network
C. A smoothing method that uses lower-order N-grams if higher-order ones are missing
D. A method to stop the algorithm

48 Which application primarily utilizes Probabilistic Language Models?

A. Image Compression
B. Network Routing
C. Speech Recognition
D. Database Management

49 In the context of Word Embeddings, what does 'Polysemy' refer to?

A. Words that rhyme
B. Words with multiple meanings
C. Words with similar spellings
D. Words in different languages

50 If P(A|B) is the probability of tag A following tag B, this is an example of:

A. Emission probability
B. Edit probability
C. Observation probability
D. Transition probability