1What is the primary goal of calculating Minimum Edit Distance?
A.To find the longest common subsequence between two strings
B.To quantify the dissimilarity between two strings by counting operations
C.To calculate the probability of a word appearing in a sentence
D.To generate word embeddings for a neural network
Correct Answer: To quantify the dissimilarity between two strings by counting operations
Explanation:Minimum Edit Distance measures the minimum number of editing operations (insertion, deletion, substitution) needed to transform one string into another.
Incorrect! Try again.
2Which of the following operations is NOT typically used in the Levenshtein distance algorithm?
3In the context of dynamic programming for edit distance, if source[i] equals target[j], what is the cost of substitution?
A.
B.1
C.2
D.Infinity
Correct Answer:
Explanation:If the characters are identical, no substitution is needed, so the cost added is 0.
Incorrect! Try again.
4What is the Minimum Edit Distance between the strings 'cat' and 'cut'?
A.
B.1
C.2
D.3
Correct Answer: 1
Explanation:One operation is required: substitute 'a' with 'u'.
Incorrect! Try again.
5How does an autocorrect system typically identify candidate words for a misspelled word?
A.By selecting random words from the dictionary
B.By finding words within a certain edit distance threshold
C.By using the longest word in the dictionary
D.By choosing words that start with the same letter only
Correct Answer: By finding words within a certain edit distance threshold
Explanation:Autocorrect generates candidates by looking for words in the dictionary that are 1 or 2 edit distances away from the misspelled input.
Incorrect! Try again.
6Which algorithm is commonly used to efficiently calculate the Minimum Edit Distance?
A.Depth-First Search
B.Dynamic Programming
C.Gradient Descent
D.K-Means Clustering
Correct Answer: Dynamic Programming
Explanation:Dynamic programming is used to fill a matrix of costs to find the optimal path of operations.
Incorrect! Try again.
7In Part of Speech (POS) tagging, what is the 'Hidden' component in a Hidden Markov Model?
A.The words in the sentence
B.The Part of Speech tags
C.The punctuation marks
D.The transition probabilities
Correct Answer: The Part of Speech tags
Explanation:In an HMM for POS tagging, the tags (states) are hidden, and the words (observations) are visible.
Incorrect! Try again.
8What does the Markov Assumption state in the context of Markov Chains?
A.The future state depends on all past states
B.The future state depends only on the current state
C.The future state is independent of the current state
D.The future state depends on the hidden emissions
Correct Answer: The future state depends only on the current state
Explanation:The Markov Assumption (specifically for a first-order chain) assumes that the probability of the next state depends only on the current state, not the entire history.
Incorrect! Try again.
9What are 'Transition Probabilities' in an HMM?
A.The probability of generating a specific word given a tag
B.The probability of moving from one POS tag to another
C.The probability of a sentence starting with a specific word
D.The probability of a word being misspelled
Correct Answer: The probability of moving from one POS tag to another
Explanation:Transition probabilities define the likelihood of a sequence of states, such as a Noun following a Determiner.
Incorrect! Try again.
10What are 'Emission Probabilities' in an HMM used for POS tagging?
A.P(tag|previous_tag)
B.P(word|tag)
C.P(tag|word)
D.P(word|previous_word)
Correct Answer: P(word|tag)
Explanation:Emission probability is the likelihood of observing a specific word given the current hidden state (POS tag).
Incorrect! Try again.
11Which algorithm is used to find the most likely sequence of hidden states (POS tags) given a sequence of observations?
A.The Viterbi Algorithm
B.The Forward Algorithm
C.The Backward Algorithm
D.The Edit Distance Algorithm
Correct Answer: The Viterbi Algorithm
Explanation:The Viterbi algorithm is a dynamic programming algorithm used for decoding: finding the most probable sequence of hidden states.
Incorrect! Try again.
12A text corpus is:
A.A dictionary of word definitions
B.A large, structured set of texts used for statistical analysis
C.A software used for autocorrection
D.A set of rules for grammar
Correct Answer: A large, structured set of texts used for statistical analysis
Explanation:A corpus is a large body of text used to train language models, calculate probabilities, and analyze linguistic patterns.
Incorrect! Try again.
13In an N-gram language model, what is 'N'?
A.The number of hidden states
B.The number of words in the sentence
C.The number of words in the sequence considered for probability
D.The dimension of the word embedding
Correct Answer: The number of words in the sequence considered for probability
Explanation:N represents the size of the window of words. A bigram has N=2, looking at the current word and one previous word.
Incorrect! Try again.
14Which assumption simplifies the calculation of N-gram probabilities?
A.The probability of a word depends only on the previous N-1 words
B.Words are independent of each other
C.All words have equal probability
D.The probability depends on the entire sentence history
Correct Answer: The probability of a word depends only on the previous N-1 words
Explanation:This is the Markov assumption applied to N-gram models to make computation feasible.
Incorrect! Try again.
15How is the probability of a bigram P(w2 | w1) calculated from a corpus?
A.Count(w1, w2) / Count(w2)
B.Count(w1, w2) / Count(w1)
C.Count(w1) / Count(w2)
D.Count(w1) * Count(w2)
Correct Answer: Count(w1, w2) / Count(w1)
Explanation:The probability of w2 following w1 is the count of the pair (w1, w2) divided by the total count of the history word w1.
Incorrect! Try again.
16What is the main problem with N-gram models when N is very large?
A.The model becomes too simple
B.Data sparsity (many sequences have zero counts)
C.The vocabulary size decreases
D.The context window becomes too small
Correct Answer: Data sparsity (many sequences have zero counts)
Explanation:As N increases, the number of possible combinations grows exponentially, and most specific sequences will not appear in the training corpus.
Incorrect! Try again.
17What technique is used to handle N-grams that have zero probability in the training data?
Explanation:Smoothing techniques like Add-1 (Laplace) assign a small non-zero probability to unseen N-grams to prevent zero-probability errors.
Incorrect! Try again.
18In Laplace (Add-1) smoothing, what is added to the denominator?
A.1
B.The vocabulary size (V)
C.The number of sentences
D.The total word count
Correct Answer: The vocabulary size (V)
Explanation:To normalize the probability after adding 1 to the numerator of every word type, the vocabulary size V must be added to the denominator.
Incorrect! Try again.
19What does an autocomplete system try to maximize?
A.P(next_word | previous_words)
B.P(previous_words | next_word)
C.The edit distance between words
D.The length of the sentence
Correct Answer: P(next_word | previous_words)
Explanation:Autocomplete predicts the most likely next word given the context of previous words.
Incorrect! Try again.
20A Trigram model looks at how many previous words to predict the next word?
A.
B.1
C.2
D.3
Correct Answer: 2
Explanation:A trigram involves 3 words total (the target and 2 history words), so it looks at the previous 2 words.
Incorrect! Try again.
21One-hot encoding of words results in vectors that are:
A.Dense and low-dimensional
B.Sparse and high-dimensional
C.Dense and high-dimensional
D.Sparse and low-dimensional
Correct Answer: Sparse and high-dimensional
Explanation:One-hot vectors have a size equal to the vocabulary (high-dimensional) and contain mostly zeros (sparse).
Incorrect! Try again.
22What is a major limitation of one-hot encoding for words?
A.It is difficult to compute
B.It does not capture semantic similarity between words
C.It cannot represent rare words
D.It requires a neural network
Correct Answer: It does not capture semantic similarity between words
Explanation:In one-hot encoding, all word vectors are orthogonal, so the distance between 'car' and 'bus' is the same as 'car' and 'apple'.
Incorrect! Try again.
23Word embeddings typically represent words as:
A.Integers
B.Sparse binary vectors
C.Dense vectors of real numbers
D.Strings
Correct Answer: Dense vectors of real numbers
Explanation:Embeddings are dense vectors (e.g., length 300) containing real numbers that capture semantic meaning.
Incorrect! Try again.
24Which metric is commonly used to measure the similarity between two word embedding vectors?
A.Edit Distance
B.Cosine Similarity
C.Jaccard Index
D.Perplexity
Correct Answer: Cosine Similarity
Explanation:Cosine similarity measures the cosine of the angle between two vectors, indicating how close they are in the semantic space.
Incorrect! Try again.
25The Word2Vec model 'Skip-gram' architecture tries to predict:
A.The target word given the context words
B.The context words given the target word
C.The next sentence
D.The POS tag of the word
Correct Answer: The context words given the target word
Explanation:Skip-gram takes a central target word and tries to predict the surrounding context words.
Incorrect! Try again.
26The Word2Vec model 'CBOW' (Continuous Bag of Words) architecture tries to predict:
A.The target word given the context words
B.The context words given the target word
C.The document topic
D.The part of speech
Correct Answer: The target word given the context words
Explanation:CBOW takes the surrounding context words and sums their vectors to predict the missing center word.
Incorrect! Try again.
27What famous algebraic property is often cited to demonstrate the semantic capability of word embeddings?
A.King - Man + Woman = Queen
B.Apple + Orange = Fruit
C.Paris - France = Germany
D.Fast + Slow = Speed
Correct Answer: King - Man + Woman = Queen
Explanation:This vector operation demonstrates that embeddings capture gender and royalty relationships.
Incorrect! Try again.
28In the 'Noisy Channel Model' for spelling correction, P(x|w) represents:
A.The probability of the word w appearing in the corpus
B.The probability that the user meant w but typed x
C.The probability of typing x given the intended word w
D.The probability of x being a valid word
Correct Answer: The probability of typing x given the intended word w
Explanation:P(x|w) is the error model probability: the likelihood of generating the typo 'x' when the intended word was 'w'.
Incorrect! Try again.
29When building an HMM for POS tagging, the sum of probabilities of all outgoing transitions from a single state must equal:
A.
B.1
C.The number of states
D.The number of observations
Correct Answer: 1
Explanation: Probabilities are normalized; the sum of probabilities of transitioning to any possible next state must be 1.
Incorrect! Try again.
30What is 'Perplexity' in the context of Language Models?
A.A measure of how well a probability model predicts a sample
B.The time taken to train the model
C.The number of parameters in the model
D.The size of the vocabulary
Correct Answer: A measure of how well a probability model predicts a sample
Explanation:Perplexity is an intrinsic evaluation metric. Lower perplexity indicates a better model.
Incorrect! Try again.
31Why do we use Log Probabilities instead of raw probabilities in N-gram calculations?
A.To make numbers larger
B.To avoid arithmetic underflow
C.To increase perplexity
D.To handle negative numbers
Correct Answer: To avoid arithmetic underflow
Explanation:Multiplying many small probabilities results in extremely small numbers that computers cannot represent accurately (underflow). Adding logs avoids this.
Incorrect! Try again.
32Which of the following describes the 'Start' token (<s>) in N-gram models?
A.It indicates the end of a sentence
B.It gives context for the first word in the sentence
C.It represents an unknown word
D.It is used for punctuation
Correct Answer: It gives context for the first word in the sentence
Explanation:The start token allows the model to calculate the probability of a word being at the beginning of a sentence.
Incorrect! Try again.
33What represents the 'Observations' in a POS HMM?
A.The sequence of tags
B.The sequence of words in the text
C.The transition matrix
D.The initial state probabilities
Correct Answer: The sequence of words in the text
Explanation:In an HMM for POS tagging, we observe the words and try to infer the hidden tags.
Incorrect! Try again.
34In Minimum Edit Distance, the 'backtrace' step is used to:
A.Calculate the cost
B.Determine the actual sequence of operations (alignment)
C.Initialize the matrix
D.Sum the rows
Correct Answer: Determine the actual sequence of operations (alignment)
Explanation:While the forward pass calculates the cost, backtracing from the bottom-right of the matrix reveals which operations led to that cost.
Incorrect! Try again.
35Which token is typically used to replace words not found in the training vocabulary?
A.<START>
B.<END>
C.<UNK>
D.<NULL>
Correct Answer: <UNK>
Explanation:<UNK> (Unknown) is the standard token used to represent out-of-vocabulary words.
Incorrect! Try again.
36A 'Unigram' model assumes that:
A.Words depend on the previous word
B.Words are independent of context
C.Words depend on the previous two words
D.Words depend on the grammar
Correct Answer: Words are independent of context
Explanation:A unigram model calculates probability based solely on word frequency, ignoring surrounding context.
Incorrect! Try again.
37The dimensionality of a Word2Vec embedding vector is typically chosen by:
A.The size of the vocabulary
B.The length of the sentence
C.The system designer (hyperparameter)
D.The number of unique characters
Correct Answer: The system designer (hyperparameter)
Explanation:The dimension (e.g., 100, 300) is a design choice, unlike one-hot vectors which are fixed by vocabulary size.
Incorrect! Try again.
38In the equation P(tag|word) ∝ P(word|tag) * P(tag), what is P(tag)?
A.Likelihood
B.Prior probability
C.Posterior probability
D.Emission probability
Correct Answer: Prior probability
Explanation:P(tag) represents the prior probability of the tag occurring, typically approximated by the transition probability from the previous tag.
Incorrect! Try again.
39If we want to build a spell checker, which probability do we want to maximize according to Bayes' theorem?
A.P(typo | correction)
B.P(correction | typo)
C.P(typo)
D.P(correction)
Correct Answer: P(correction | typo)
Explanation:We observe the typo and want to find the most probable correction given that typo.
Incorrect! Try again.
40Which type of language model suffers most from the 'curse of dimensionality'?
A.Unigram model
B.High-order N-gram model (e.g., 5-gram)
C.Word2Vec
D.Bag of Words
Correct Answer: High-order N-gram model (e.g., 5-gram)
Explanation:As N increases, the number of parameters grows exponentially, requiring exponentially more data to train effectively.
Incorrect! Try again.
41What is the primary input to a neural network training a Word2Vec model?
A.Audio signals
B.Image pixels
C.One-hot encoded vectors of words
D.Parse trees
Correct Answer: One-hot encoded vectors of words
Explanation:The input layer usually receives one-hot vectors, which are then multiplied by the weight matrix to get the embedding.
Incorrect! Try again.
42The term 'corpus' in NLP refers to:
A.A computer algorithm
B.A specific neural network layer
C.A body of text data
D.A type of spelling error
Correct Answer: A body of text data
Explanation:Corpus (plural: corpora) is the raw text data used to train models.
Incorrect! Try again.
43In edit distance, if we assign a higher cost to substitution than insertion/deletion, it implies:
A.Typing a wrong letter is considered worse than missing a letter
B.The algorithm will fail
C.The distance will always be zero
D.Insertion is impossible
Correct Answer: Typing a wrong letter is considered worse than missing a letter
Explanation:Costs reflect the penalty of an error; a higher cost makes that specific operation less likely to appear in the optimal path (minimum distance).
Incorrect! Try again.
44What is the result of using a sliding window in N-gram generation?
A.It removes stop words
B.It creates a sequence of overlapping word chunks
C.It converts text to uppercase
D.It calculates the edit distance
Correct Answer: It creates a sequence of overlapping word chunks
Explanation:A sliding window moves over the text to extract tokens for (w-1, w) pairs used in training.
Incorrect! Try again.
45Which of these words likely has a vector closest to 'frog' in a well-trained embedding space?
A.Galaxy
B.Toad
C.Steel
D.Philosophy
Correct Answer: Toad
Explanation:Embeddings capture semantic meaning; 'frog' and 'toad' share biological and contextual similarities.
Incorrect! Try again.
46In an HMM, what connects hidden states to each other?
A.Emission probabilities
B.Transition probabilities
C.The Viterbi path
D.Observation vectors
Correct Answer: Transition probabilities
Explanation:Transition probabilities define the likelihood of moving from one hidden state to the next hidden state.
Incorrect! Try again.
47What is 'Stupid Backoff' in the context of Language Models?
A.A way to delete wrong words
B.A smoothing method that uses lower-order N-grams if higher-order ones are missing
C.A method to stop the algorithm
D.A type of neural network
Correct Answer: A smoothing method that uses lower-order N-grams if higher-order ones are missing
Explanation:If a trigram count is zero, the model 'backs off' to use the bigram probability (often multiplied by a constant like 0.4).
Incorrect! Try again.
48Which application primarily utilizes Probabilistic Language Models?
A.Image Compression
B.Speech Recognition
C.Database Management
D.Network Routing
Correct Answer: Speech Recognition
Explanation:Language models help distinguish between acoustically similar words (e.g., 'recognize speech' vs 'wreck a nice beach') based on probability.
Incorrect! Try again.
49In the context of Word Embeddings, what does 'Polysemy' refer to?
A.Words with multiple meanings
B.Words with similar spellings
C.Words in different languages
D.Words that rhyme
Correct Answer: Words with multiple meanings
Explanation:Polysemy (e.g., 'bank' as a river side vs financial institution) is a challenge for static embeddings like Word2Vec, which assign one vector per word.
Incorrect! Try again.
50If P(A|B) is the probability of tag A following tag B, this is an example of:
A.Emission probability
B.Transition probability
C.Observation probability
D.Edit probability
Correct Answer: Transition probability
Explanation:It describes the transition between two hidden states (tags).
Incorrect! Try again.
Give Feedback
Help us improve by sharing your thoughts or reporting issues.