1Which decade is generally considered the starting point of Natural Language Processing, marked by Alan Turing's publication on machine intelligence?
origin of NLP
Easy
A.1930s
B.1990s
C.1970s
D.1950s
Correct Answer: 1950s
Explanation:
The history of NLP generally began in the 1950s. Alan Turing published an article titled 'Computing Machinery and Intelligence' in 1950, which proposed what is now called the Turing Test.
Incorrect! Try again.
2What defines the set of structural rules governing the composition of clauses, phrases, and words in any given natural language?
language and grammar
Easy
A.Grammar
B.Pragmatics
C.Semantics
D.Phonology
Correct Answer: Grammar
Explanation:
Grammar represents the rules for constructing valid sentences and structures in a language.
Incorrect! Try again.
3What is defined as the study of the internal structure of words and how they are formed?
morphology
Easy
A.Phonetics
B.Morphology
C.Syntax
D.Semantics
Correct Answer: Morphology
Explanation:
Morphology is the sub-discipline of linguistics that studies word structure, including stems, root words, prefixes, and suffixes.
Incorrect! Try again.
4Which level of linguistic analysis focuses specifically on the order of words and how they combine to form sentences?
syntax
Easy
A.Morphology
B.Syntax
C.Pragmatics
D.Phonology
Correct Answer: Syntax
Explanation:
Syntax is concerned with sentence structure, specifically how words are ordered and grouped together.
Incorrect! Try again.
5What is the primary focus of semantics in Natural Language Processing?
semantics
Easy
A.The structural rules of sentences
B.The historical origin of languages
C.The literal meaning of words and sentences
D.The pronunciation of words
Correct Answer: The literal meaning of words and sentences
Explanation:
Semantics is the study of meaning in language, focusing on how words, phrases, and sentences convey literal meaning.
Incorrect! Try again.
6Which of the following describes 'ambiguity' in Natural Language Processing?
challenges of NLP
Easy
A.A word that is misspelled
B.A sentence or word having multiple possible meanings
C.A sentence written in multiple languages
D.A sentence that lacks punctuation
Correct Answer: A sentence or word having multiple possible meanings
Explanation:
Ambiguity occurs when a phrase or word can be interpreted in more than one way, which is a major challenge in NLP.
Incorrect! Try again.
7Which of the following is a common application of NLP?
applications of NLP
Easy
A.Database management
B.Image classification
C.Network routing
D.Machine Translation
Correct Answer: Machine Translation
Explanation:
Machine Translation, like translating text from English to French, is a classic and widely used application of NLP.
Incorrect! Try again.
8What is the process of breaking down a continuous stream of text into smaller units like words or sentences?
tokenization
Easy
A.Normalization
B.Stemming
C.Tokenization
D.Lemmatization
Correct Answer: Tokenization
Explanation:
Tokenization is the foundational step in text processing where text is divided into smaller pieces called tokens.
Incorrect! Try again.
9What is the primary purpose of stemming in text processing?
stemming
Easy
A.To translate a word into another language
B.To find synonyms of a word
C.To chop off affixes to reduce a word to its root form
D.To check the spelling of a word
Correct Answer: To chop off affixes to reduce a word to its root form
Explanation:
Stemming is a crude heuristic process that chops off the ends of words to reduce them to a common base or root form.
Incorrect! Try again.
10How does lemmatization primarily differ from stemming?
lemmatization
Easy
A.Lemmatization uses a vocabulary and morphological analysis to return a valid dictionary word.
B.Lemmatization is faster than stemming.
C.Lemmatization only works on numbers.
D.Lemmatization removes all vowels from a word.
Correct Answer: Lemmatization uses a vocabulary and morphological analysis to return a valid dictionary word.
Explanation:
Unlike stemming, which just chops off characters, lemmatization considers the context and converts the word to its meaningful base form (lemma) found in the dictionary.
Incorrect! Try again.
11What are stop-words in the context of NLP preprocessing?
stop-word removal
Easy
A.Words that carry the most important meaning in a text
B.Highly frequent words that carry very little semantic meaning, like 'the' and 'is'
C.Words that indicate the end of a sentence
D.Words that are not found in the dictionary
Correct Answer: Highly frequent words that carry very little semantic meaning, like 'the' and 'is'
Explanation:
Stop-words are common words that are often removed during preprocessing because they do not add significant meaning to the text.
Incorrect! Try again.
12Why is punctuation handling an important step in text preprocessing?
punctuation handling
Easy
A.To ensure grammatical correctness for the end user
B.To add missing commas to sentences
C.To translate text accurately
D.To prevent punctuation marks from being incorrectly attached to words as part of the token
Correct Answer: To prevent punctuation marks from being incorrectly attached to words as part of the token
Explanation:
Removing or handling punctuation ensures that words like 'hello!' and 'hello' are treated as the same token rather than two distinct vocabulary items.
Incorrect! Try again.
13What does OOV stand for in Natural Language Processing?
handling out-of-vocabulary words
Easy
A.Object-Oriented-Verb
B.Out-Of-Vocabulary
C.Output-Observation-Value
D.Over-Optimized-Vector
Correct Answer: Out-Of-Vocabulary
Explanation:
OOV stands for Out-Of-Vocabulary, referring to words that appear in the testing data or real-world input but were not present in the model's training dictionary.
Incorrect! Try again.
14Which of the following is a classic example of text normalization?
normalization
Easy
A.Converting all text to lowercase
B.Generating a summary of a document
C.Extracting named entities
D.Translating text to English
Correct Answer: Converting all text to lowercase
Explanation:
Text normalization transforms text into a standard format, such as converting all uppercase characters to lowercase, to reduce variation.
Incorrect! Try again.
15What major linguistic feature does the basic Bag-of-Words (BoW) model ignore?
Bag-of-Words
Easy
A.Word occurrence
B.Word order and context
C.Word frequency
D.Vocabulary size
Correct Answer: Word order and context
Explanation:
The Bag-of-Words model represents text as a multiset (bag) of its words, keeping track of frequency but completely ignoring grammar and word order.
Incorrect! Try again.
16In the context of n-grams, what is a 'bigram'?
n-grams
Easy
A.A sequence of two consecutive words
B.A word with two syllables
C.A document with two paragraphs
D.A sentence with two clauses
Correct Answer: A sequence of two consecutive words
Explanation:
An n-gram is a contiguous sequence of n items from a text. When n=2, it is called a bigram, meaning two consecutive words.
Incorrect! Try again.
17What does 'TF' stand for in the TF-IDF representation?
TF-IDF
Easy
A.Token Format
B.Text Feature
C.Total Frequency
D.Term Frequency
Correct Answer: Term Frequency
Explanation:
TF stands for Term Frequency, which measures how frequently a term occurs in a specific document.
Incorrect! Try again.
18What is the main purpose of the Inverse Document Frequency (IDF) component in TF-IDF?
TF-IDF
Easy
A.To increase the weight of stop-words
B.To count the total number of documents
C.To reduce the weight of words that appear in many documents across the corpus
D.To penalize words that are rare in the corpus
Correct Answer: To reduce the weight of words that appear in many documents across the corpus
Explanation:
IDF diminishes the weight of terms that occur very frequently in the document set (like 'the', 'is') and increases the weight of terms that occur rarely, helping to highlight unique terms.
Incorrect! Try again.
19What is a 'phoneme' in linguistics?
linguistic essentials
Easy
A.The smallest unit of sound that distinguishes one word from another
B.A dictionary of words
C.The rules of sentence construction
D.The smallest structural unit that carries meaning
Correct Answer: The smallest unit of sound that distinguishes one word from another
Explanation:
A phoneme is a basic unit of a language's phonology, representing the smallest unit of sound (like /p/ or /b/) that can differentiate words.
Incorrect! Try again.
20Which famous 1954 experiment provided a highly publicized demonstration of machine translation, translating Russian sentences into English?
origin of NLP
Easy
A.The ELIZA project
B.The Turing Experiment
C.The Georgetown-IBM experiment
D.The SHRDLU system
Correct Answer: The Georgetown-IBM experiment
Explanation:
The Georgetown-IBM experiment in 1954 was a foundational event in NLP history, demonstrating the automatic translation of more than sixty Russian sentences into English.
Incorrect! Try again.
21Early NLP systems like ELIZA (1966) simulated conversation using pattern matching. What was the primary limitation of this approach regarding natural language understanding?
origin of NLP
Medium
A.It could only process mathematical equations rather than human language.
B.It possessed no actual understanding of syntactic structure or semantic meaning.
C.It required massive computational power to run the neural networks.
D.It required the user to input text in binary code.
Correct Answer: It possessed no actual understanding of syntactic structure or semantic meaning.
Explanation:
ELIZA operated on simple pattern matching and substitution methodologies without parsing the grammatical structure or grasping the semantic meaning of the text.
Incorrect! Try again.
22If an NLP system uses Context-Free Grammar (CFG) to parse sentences, which of the following phenomena is most challenging for it to handle efficiently?
language and grammar
Medium
A.Generating a standard syntax tree for a declarative sentence.
B.Context-sensitive dependencies, such as cross-serial dependencies.
C.Recognizing terminal symbols (words) in a lexicon.
D.Subject-verb agreement within a simple sentence.
Correct Answer: Context-sensitive dependencies, such as cross-serial dependencies.
Explanation:
Context-Free Grammars are limited in their generative capacity and cannot naturally model cross-serial dependencies or context-sensitive rules without becoming overly complex.
Incorrect! Try again.
23Consider the sentence: 'Can you pass the salt?' While syntactically a question, it functions as a request. Which linguistic level of analysis is required to understand this intended meaning?
linguistic essentials
Medium
A.Phonology
B.Syntax
C.Morphology
D.Pragmatics
Correct Answer: Pragmatics
Explanation:
Pragmatics deals with how context influences meaning, allowing us to interpret a literal question as a polite request.
Incorrect! Try again.
24Analyze the words 'running' and 'runner'. Which of the following statements correctly identifies the morphological processes applied to the root word 'run'?
morphology
Medium
A.Both are examples of inflectional morphology.
B.Both are examples of derivational morphology.
C.'Running' uses inflectional morphology, while 'runner' uses derivational morphology.
D.'Running' uses derivational morphology, while 'runner' uses inflectional morphology.
Inflectional morphology adapts a word to its grammatical role (run -> running), while derivational morphology creates a new word with a different part of speech or core meaning (run -> runner).
Incorrect! Try again.
25The sentence 'I saw the man with the telescope' exhibits structural ambiguity. Which NLP task is directly responsible for resolving how 'with the telescope' attaches to the rest of the sentence?
syntax
Medium
A.Part-of-Speech tagging
B.Morphological segmentation
C.Dependency parsing
D.Named Entity Recognition
Correct Answer: Dependency parsing
Explanation:
Dependency parsing (or syntactic parsing) determines the grammatical structure of a sentence, including prepositional phrase attachment, thereby resolving structural ambiguities.
Incorrect! Try again.
26When building a word sense disambiguation model, you encounter the word 'bank' in the contexts of 'river bank' and 'bank account'. This is an example of dealing with which semantic concept?
semantics
Medium
A.Polysemy/Homonymy
B.Hyponymy
C.Antonymy
D.Synonymy
Correct Answer: Polysemy/Homonymy
Explanation:
Homonymy and polysemy refer to words that share the same spelling or pronunciation but have different meanings, which requires disambiguation based on context.
Incorrect! Try again.
27In the sentence 'John told his father that he was promoted', the pronoun 'he' is ambiguous. Resolving whether 'he' refers to John or his father is an example of which NLP challenge?
challenges of NLP
Medium
A.Syntactic ambiguity
B.Coreference resolution
C.Lexical ambiguity
D.Speech segmentation
Correct Answer: Coreference resolution
Explanation:
Coreference resolution is the task of finding all expressions in a text that refer to the same entity, such as determining the antecedent of a pronoun.
Incorrect! Try again.
28If you are designing an NLP pipeline for a conversational AI (chatbot) meant to book flights, which sequence of applications is most logical for processing user input?
applications of NLP
Medium
A.Sentiment Analysis -> Optical Character Recognition -> POS Tagging
A task-oriented chatbot first needs to understand what the user wants (Intent Recognition), extract relevant details like dates and locations (NER), and decide how to respond (Dialog Management).
Incorrect! Try again.
29Using Byte Pair Encoding (BPE) for tokenization helps mitigate which specific problem compared to standard word-level tokenization?
tokenization
Medium
A.It effectively manages out-of-vocabulary (OOV) and rare words by breaking them into known subwords.
B.It removes the need for stop-word removal.
C.It automatically corrects spelling mistakes in the source text.
D.It prevents the creation of syntactic ambiguity.
Correct Answer: It effectively manages out-of-vocabulary (OOV) and rare words by breaking them into known subwords.
Explanation:
BPE is a subword tokenization algorithm that creates tokens for frequent character sequences, allowing the model to construct rare or unknown words from smaller, known subword pieces.
Incorrect! Try again.
30An algorithm reduces both 'university' and 'universal' to the stem 'univers'. What type of stemming error has occurred, and why is it problematic for downstream tasks?
stemming
Medium
A.Over-stemming; words with different meanings are conflated into the same token.
B.Normalization error; the algorithm failed to lowercase the tokens.
C.Lemmatization failure; the algorithm failed to find the dictionary root.
D.Under-stemming; the words will not be matched despite having the same meaning.
Correct Answer: Over-stemming; words with different meanings are conflated into the same token.
Explanation:
Over-stemming occurs when words that have different meanings are reduced to the same stem, which can confuse downstream models relying on semantic differences.
Incorrect! Try again.
31Unlike a standard stemmer, a lemmatizer typically requires which piece of additional information to accurately reduce the word 'saw' to its root form?
lemmatization
Medium
A.The sentiment score of the surrounding sentence.
B.The document frequency of the word.
C.The Part-of-Speech (POS) tag of the word in its context.
D.The character-level embeddings of the word.
Correct Answer: The Part-of-Speech (POS) tag of the word in its context.
Explanation:
Lemmatization uses vocabulary and morphological analysis, meaning it needs to know whether 'saw' is a noun (a tool) or a verb (past tense of see) to return the correct lemma.
Incorrect! Try again.
32In which of the following NLP tasks is aggressive stop-word removal most likely to degrade model performance?
stop-word removal
Medium
A.Topic modeling (e.g., LDA).
B.Document clustering based on topic.
C.Spam email detection using Bag-of-Words.
D.Machine Translation (e.g., translating English to French).
Correct Answer: Machine Translation (e.g., translating English to French).
Explanation:
Machine translation relies heavily on syntax, grammar, and function words to generate fluent and accurate sentences. Removing stop words would destroy the grammatical structure.
Incorrect! Try again.
33When designing a regular expression-based tokenizer, dealing with punctuation can be tricky. Which of the following examples best demonstrates why simply splitting text on all punctuation characters is a poor strategy?
Simply splitting on all punctuation destroys meaningful linguistic constructs like contractions ('s) and hyphenated compound words (state-of-the-art), losing semantic context.
Incorrect! Try again.
34A model trained on a specific corpus encounters a new word during testing. If the model uses a fixed word-level vocabulary with a single <UNK> token, how does it process the new word?
handling out-of-vocabulary words
Medium
A.It dynamically expands its embedding matrix by adding a new randomized vector.
B.It maps the word to the <UNK> token, treating all unknown words identically.
C.It ignores the word entirely to prevent matrix dimension errors.
D.It uses character-level CNNs to infer the word's exact meaning.
Correct Answer: It maps the word to the <UNK> token, treating all unknown words identically.
Explanation:
In standard word-level models with a fixed vocabulary, any out-of-vocabulary word is mapped to a universal <UNK> (unknown) token, meaning the model loses any specific information about that word.
Incorrect! Try again.
35Which of the following scenarios demonstrates a potential negative consequence of applying universal case folding (lowercasing all text) during text normalization?
normalization
Medium
A.'apple' (fruit) and 'Apple' (company) are mapped to the same token, losing distinction.
B.'RUN' and 'run' are mapped to the same token, aiding general search.
C.The size of the vocabulary is drastically increased, causing memory issues.
D.Punctuation is accidentally removed during the lowercasing process.
Correct Answer: 'apple' (fruit) and 'Apple' (company) are mapped to the same token, losing distinction.
Explanation:
Case folding can destroy the distinction between proper nouns (Apple the company) and common nouns (apple the fruit), which can be detrimental for tasks like Named Entity Recognition.
Incorrect! Try again.
36Given a vocabulary of size , you represent a 50-word sentence using a standard Bag-of-Words (BoW) vector. What is the dimension of the resulting vector, and what is its primary characteristic?
Bag-of-Words
Medium
A.Dimension is 10,000; the vector is dense.
B.Dimension is 50; the vector is dense.
C.Dimension is 10,000; the vector is highly sparse.
D.Dimension is 50; the vector is highly sparse.
Correct Answer: Dimension is 10,000; the vector is highly sparse.
Explanation:
A BoW vector has a length equal to the vocabulary size (). Since the sentence only has 50 words, most of the 10,000 entries will be zero, making the vector highly sparse.
Incorrect! Try again.
37Consider a sentence containing exactly words. If we extract standard continuous trigrams (n=3) from this sentence, how many trigram tokens will be generated? (Assume )
n-grams
Medium
A.
B.
C.
D.
Correct Answer:
Explanation:
For a sequence of length , the number of n-grams produced is . For trigrams (), this evaluates to .
Incorrect! Try again.
38In the TF-IDF formulation, the Inverse Document Frequency (IDF) of a term is often calculated as . If a specific stop-word appears in every single document in a corpus of size , what will its IDF value be?
TF-IDF
Medium
A.
B.$1$
C.
D.$0$
Correct Answer: $0$
Explanation:
If the term appears in all documents, . The formula becomes . This mathematically nullifies the importance of uninformative words.
Incorrect! Try again.
39Suppose document A has 100 words, and the word 'AI' appears 5 times. Document B has 500 words, and 'AI' appears 15 times. Using basic Term Frequency (count divided by document length), which document gives higher TF to the word 'AI'?
TF-IDF
Medium
A.Document B, because its TF is $15$, compared to Document A's TF of $5$.
B.They have the same TF because IDF balances the counts.
C.Document B, because its TF is $0.03$, compared to Document A's TF of $0.05$.
D.Document A, because its TF is $0.05$, compared to Document B's TF of $0.03$.
Correct Answer: Document A, because its TF is $0.05$, compared to Document B's TF of $0.03$.
Explanation:
Normalized Term Frequency is calculated as (Term Count / Total Words). For Doc A: . For Doc B: . Thus, Doc A has a higher TF.
Incorrect! Try again.
40When training an n-gram language model, sparsity becomes a major issue for higher values of . Which technique is commonly applied to handle -grams that appear in the test set but were never seen in the training set?
n-grams
Medium
A.Term Frequency-Inverse Document Frequency (TF-IDF) weighting
B.Stop-word removal
C.Laplace (Add-1) Smoothing or Backoff
D.Stemming and Lemmatization
Correct Answer: Laplace (Add-1) Smoothing or Backoff
Explanation:
Smoothing techniques (like Add-1) or Backoff (reverting to a lower-order n-gram) are used to assign non-zero probabilities to unseen n-grams, resolving the zero-frequency problem.
Incorrect! Try again.
41Which of the following best characterizes the primary historical consequence of the 1966 ALPAC (Automatic Language Processing Advisory Committee) report on the early trajectory of Natural Language Processing?
origin of NLP
Hard
A.It catalyzed the immediate development of the first conversational agents, such as ELIZA, to bypass the complexities of syntactic translation.
B.It accelerated the shift from rule-based systems to deep learning architectures by highlighting the computational limits of the era.
C.It caused a drastic reduction in funding for machine translation research by concluding that fully automatic high-quality translation was unattainable in the near future.
D.It established the mathematical foundation for the Chomsky hierarchy, permanently linking computer science formalisms to linguistic theory.
Correct Answer: It caused a drastic reduction in funding for machine translation research by concluding that fully automatic high-quality translation was unattainable in the near future.
Explanation:
The 1966 ALPAC report concluded that machine translation was more expensive, less accurate, and slower than human translation. This led to a significant defunding of NLP and AI research in the US, known as the first 'AI Winter' for machine translation.
Incorrect! Try again.
42Natural languages exhibit phenomena such as center embedding (e.g., "The mouse the cat the dog bit chased ran away"). In the context of the Chomsky hierarchy, what does the theoretical existence of arbitrary-depth center embedding imply about human language?
language and grammar
Hard
A.It demonstrates that natural language morphologies are inherently regular, as they rely on linear concatenation.
B.It implies that natural languages are strictly finite-state and can be parsed using simple deterministic automata.
C.It proves that natural languages are uncomputable and cannot be parsed by Turing machines.
D.It demonstrates that natural languages cannot be completely modeled by Regular Grammars and require at least Context-Free Grammars.
Correct Answer: It demonstrates that natural languages cannot be completely modeled by Regular Grammars and require at least Context-Free Grammars.
Explanation:
Center embedding creates structures of the form , which require a pushdown automaton to keep track of nested dependencies. Because finite-state automata lack memory to track arbitrary depths of nesting, Regular Grammars are insufficient.
Incorrect! Try again.
43Consider the morphological transformation from the root word "compute" to the word "computational". Which sequence of morphological processes accurately characterizes this transition?
morphology
Hard
A.An agglutinative stacking of inflectional morphemes representing tense, case, and gender simultaneously.
B.A combination of compounding and cliticization, merging independent morphemes to form a complex nominal modifier.
C.A series of purely inflectional affixes that adapt the word to different syntactic roles without altering its core semantic meaning.
D.Multiple derivational affixes applied sequentially, changing both the semantic meaning and the syntactic category (verb noun adjective).
Correct Answer: Multiple derivational affixes applied sequentially, changing both the semantic meaning and the syntactic category (verb noun adjective).
Explanation:
The root 'compute' (verb) takes the derivational suffix '-ation' to become 'computation' (noun), which then takes another derivational suffix '-al' to become 'computational' (adjective). Derivational morphology alters the syntactic category and core meaning.
Incorrect! Try again.
44The sentence "I saw the man with the telescope" exhibits severe structural ambiguity. In a Probabilistic Context-Free Grammar (PCFG), how is the correct parse tree mathematically resolved during syntactic inference?
syntax
Hard
A.By computing the parse tree that maximizes the joint probability of the structural derivations given the lexical observations , often using the CYK algorithm.
B.By calculating the longest common subsequence of non-terminal symbols mapped to the training corpus.
C.By selecting the parse tree that minimizes the cross-entropy loss of the terminal leaves.
D.By applying deterministic shift-reduce parsing and prioritizing shift operations over reduce operations unconditionally.
Correct Answer: By computing the parse tree that maximizes the joint probability of the structural derivations given the lexical observations , often using the CYK algorithm.
Explanation:
In PCFGs, ambiguity is resolved by calculating the probability of all possible parse trees for a given sentence and selecting the one with the highest probability, typically calculated using dynamic programming approaches like the Cocke-Younger-Kasami (CYK) algorithm.
Incorrect! Try again.
45The Principle of Compositionality asserts that the meaning of a complex expression is determined by the meanings of its constituent expressions and the rules used to combine them. Which of the following linguistic phenomena breaks this principle, requiring specialized representation in vector space models?
semantics
Hard
A.Non-compositional multi-word expressions (idioms), where the overarching semantic meaning is decoupled from the literal constituent semantics.
B.Synonymy, where two structurally different expressions yield the exact same semantic representation.
C.Polysemous words, where a single word has multiple related meanings depending on its context.
D.Morphological derivations that transition words across parts-of-speech.
Correct Answer: Non-compositional multi-word expressions (idioms), where the overarching semantic meaning is decoupled from the literal constituent semantics.
Explanation:
Idioms (e.g., 'kick the bucket') violate the Principle of Compositionality because their meaning cannot be deduced by mathematically combining the meaning of their constituent words ('kick', 'the', 'bucket').
Incorrect! Try again.
46The Winograd Schema Challenge (e.g., "The trophy didn't fit into the brown suitcase because it was too large.") is designed to test machine intelligence. What specific NLP challenge does this schema fundamentally target?
challenges of NLP
Hard
A.Syntactic parsing of highly ambiguous prepositional phrase attachments.
B.Coreference resolution that strictly requires world knowledge and common-sense reasoning.
C.Morphological segmentation of agglutinative languages.
D.Lexical disambiguation of homophones in unconstrained text.
Correct Answer: Coreference resolution that strictly requires world knowledge and common-sense reasoning.
Explanation:
Winograd Schemas require the model to resolve pronouns (coreference resolution) where syntactic and semantic rules alone are insufficient. The model must utilize common-sense reasoning (knowing that a trophy being too large prevents it from fitting into a suitcase) to determine what 'it' refers to.
Incorrect! Try again.
47When applying Byte-Pair Encoding (BPE) tokenization to a highly agglutinative language like Turkish or Finnish, what is the primary mathematical and practical advantage over strict word-level tokenization in deep learning models?
tokenization
Hard
A.It strictly enforces the Chomsky normal form, ensuring all tokens can be parsed into binary trees.
B.It eliminates the need for positional embeddings, as subwords implicitly encode sequence order.
C.It mitigates the sparse vocabulary problem by decomposing rare morphological variations into statistically frequent, reusable subword units.
D.It prevents the vanishing gradient problem by maintaining constant sequence lengths across all batches.
Correct Answer: It mitigates the sparse vocabulary problem by decomposing rare morphological variations into statistically frequent, reusable subword units.
Explanation:
Agglutinative languages generate a massive number of unique word forms, leading to sparse vocabularies and out-of-vocabulary (OOV) issues in word-level models. BPE circumvents this by representing rare or complex words as combinations of highly frequent subwords.
Incorrect! Try again.
48Which of the following scenarios is a canonical example of "over-stemming" (a false positive reduction) resulting from the heuristic rules of algorithms like the Porter Stemmer?
stemming
Hard
A.Failing to map "ran" and "run" to the same linguistic root.
B.Mapping the words "computation" and "computing" to the single stem "comput".
C.Mapping the plural "matrices" to the singular stem "matrix".
D.Mapping the words "universe" and "university" to the single stem "univers".
Correct Answer: Mapping the words "universe" and "university" to the single stem "univers".
Explanation:
Over-stemming occurs when two words with distinct semantic meanings are incorrectly reduced to the same stem. 'Universe' and 'university' mean different things, but aggressive suffix stripping collapses them to 'univers', destroying semantic distinction.
Incorrect! Try again.
49Unlike algorithmic stemming, dictionary-based lemmatization relies heavily on accurate Part-of-Speech (POS) tagging. Given a standard WordNet Lemmatizer, what is the expected outcome if the word "saw" in the sentence "He used a saw" is incorrectly tagged as a verb prior to lemmatization?
lemmatization
Hard
A.The lemmatizer will maintain "saw" as it relies on contextual embeddings to override faulty POS tags.
B.The lemmatizer will map "saw" to "see", thereby corrupting the downstream semantic representation of the tool.
C.The lemmatizer will output "saws", applying a default pluralization rule for unknown verb forms.
D.The lemmatizer will raise a runtime exception due to a mismatch between lexical semantics and morphological syntax.
Correct Answer: The lemmatizer will map "saw" to "see", thereby corrupting the downstream semantic representation of the tool.
Explanation:
Lemmatizers look up words based on their token and provided POS tag. If 'saw' is tagged as a verb, the lemmatizer looks for the base form of the verb 'saw', which is 'see'. This results in an incorrect lemma for the noun 'saw'.
Incorrect! Try again.
50While traditionally beneficial for Information Retrieval, aggressive stop-word removal severely degrades performance in certain modern NLP tasks. In which of the following tasks would stripping stop-words (e.g., "the", "is", "not") be most disastrous for model inference, and why?
stop-word removal
Hard
A.Named Entity Recognition, because entities are exclusively comprised of stop-word sequences.
B.Extractive Text Summarization, because sentence scoring algorithms like TextRank strictly require stop-words to compute node degrees.
C.Topic Modeling (e.g., LDA), because topics are defined exclusively by grammatical function words.
D.Sentiment Analysis involving double negatives, because removing function words obliterates the compositionality of negation context.
Correct Answer: Sentiment Analysis involving double negatives, because removing function words obliterates the compositionality of negation context.
Explanation:
Words like 'not', 'no', and 'very' are often categorized as stop-words. Removing them flips or alters the polarity and intensity of a sentence (e.g., 'not good' becomes 'good'), severely destroying the semantic context needed for Sentiment Analysis.
Incorrect! Try again.
51In clinical NLP, uniformly stripping all punctuation can lead to catastrophic semantic loss. Which of the following exemplifies a structural edge case where a punctuation mark functions as a critical semantic token rather than merely a syntactic boundary?
punctuation handling
Hard
A.The use of periods in identifying specific ICD-10 medical codes (e.g., J45.909).
B.The use of commas to separate elements in an appositive clause.
C.The use of semicolons to link two independent clauses in patient notes.
D.The use of hyphens in compounding standard English adjectives like "state-of-the-art".
Correct Answer: The use of periods in identifying specific ICD-10 medical codes (e.g., J45.909).
Explanation:
In clinical domains, periods and other punctuation within standardized codes (like ICD-10) define specific hierarchical classifications. Stripping them merges domains and alters the recognized token fundamentally (e.g., J45909 is an invalid or misaligned format).
Incorrect! Try again.
52The FastText model fundamentally improves upon the standard Word2Vec architecture primarily in its handling of Out-Of-Vocabulary (OOV) words. What is the mathematical mechanism by which FastText accomplishes this?
handling out-of-vocabulary words
Hard
A.By replacing the softmax layer with hierarchical softmax, which forces all possible character permutations into a binary tree.
B.By dynamically querying external knowledge graphs via a differentiable lookup table to construct embeddings on the fly.
C.By representing target words as the sum of the learned vectors of their constituent character -grams, enabling vector approximation for unseen words.
D.By mapping all OOV tokens to an average of the entire vocabulary matrix, ensuring a non-zero initialization.
Correct Answer: By representing target words as the sum of the learned vectors of their constituent character -grams, enabling vector approximation for unseen words.
Explanation:
FastText breaks words down into subword character -grams (e.g., for 'apple': '<ap', 'app', 'ppl', 'ple', 'le>'). The vector for a word is the sum of its -gram vectors. This allows the model to estimate vectors for unseen OOV words by summing the vectors of their known character -grams.
Incorrect! Try again.
53When preprocessing noisy multilingual text, what is the specific role of Unicode Normalization Form KC (NFKC) compared to NFC, and why is this critical for NLP normalization?
normalization
Hard
A.NFKC applies compatibility decomposition followed by canonical composition, merging visually or functionally identical but uniquely encoded characters (e.g., ligatures and full-width forms), thus reducing vocabulary sparsity.
B.NFKC reverses the byte-order of characters for right-to-left languages, standardizing all inputs to a left-to-right processing pipeline.
D.NFKC converts all text to lowercase and strips diacritics, which NFC preserves.
Correct Answer: NFKC applies compatibility decomposition followed by canonical composition, merging visually or functionally identical but uniquely encoded characters (e.g., ligatures and full-width forms), thus reducing vocabulary sparsity.
Explanation:
NFKC breaks down characters into compatibility equivalents (e.g., turning the 'fi' ligature into 'f' and 'i', or full-width Latin characters into standard Latin characters) and then composes them. This standardizes text and collapses disparate encodings of the 'same' text, significantly reducing vocabulary sparsity.
Incorrect! Try again.
54Given two documents represented as standard Bag-of-Words (BoW) frequency vectors and , the raw dot product inherently scales with document length, skewing similarity metrics. Which normalization mathematically transforms these vectors such that their dot product strictly equals their angular cosine similarity?
Bag-of-Words
Hard
A.L2 Normalization: and .
B.Min-Max scaling to the range .
C.L1 Normalization: and .
D.Z-score standardization across the vocabulary dimensions.
Correct Answer: L2 Normalization: and .
Explanation:
Cosine similarity is defined as . If you L2-normalize vectors and prior to taking the dot product, the denominators become 1, and the dot product exactly equals the cosine similarity.
Incorrect! Try again.
55In an -gram language model with a vocabulary size , the transition from bigrams to trigrams drastically exacerbates data sparsity. Which advanced smoothing technique addresses this by utilizing absolute discounting combined with an interpolation weight that falls back to lower-order models based on the diversity of contexts a word appears in?
n-grams
Hard
A.Laplace (Add-One) Smoothing
B.Witten-Bell Interpolation
C.Good-Turing Discounting
D.Kneser-Ney Smoothing
Correct Answer: Kneser-Ney Smoothing
Explanation:
Kneser-Ney smoothing fundamentally improves upon standard absolute discounting by taking into account the continuation probability—how many different contexts a word appears in. This allows the lower-order (fallback) models to assign higher probabilities to words that appear in versatile contexts.
Incorrect! Try again.
56In the standard TF-IDF formulation, Inverse Document Frequency is often calculated as . If a term appears in every single document in a training corpus of size , what is the mathematical consequence on its TF-IDF weight, and how is this edge case typically mitigated in libraries like scikit-learn?
TF-IDF
Hard
A.The IDF becomes $0$, zeroing out the entire TF-IDF weight; mitigated by adding a constant $1$ to the resulting IDF score: .
B.The IDF becomes $1$, leaving the TF unchanged; no mitigation is required as this is the intended mathematical behavior.
C.The IDF approaches negative infinity; mitigated by applying a ReLU activation function to clamp negative values.
D.The IDF becomes undefined (division by zero); mitigated by using .
Correct Answer: The IDF becomes $0$, zeroing out the entire TF-IDF weight; mitigated by adding a constant $1$ to the resulting IDF score: .
Explanation:
If , then . Since , the TF-IDF score for a word appearing in all documents becomes 0. Scikit-learn mitigates this by adding 1 to the IDF (smooth IDF) so terms appearing in all documents still carry their TF weight.
Incorrect! Try again.
57Zipf's Law states that the frequency of a word in a natural language corpus is inversely proportional to its rank , mathematically expressed as . In the context of NLP model training, what critical structural challenge does this empirical distribution formalize?
linguistic essentials
Hard
A.It establishes an upper bound on the maximum sequence length a Recurrent Neural Network can process without suffering from vanishing gradients.
B.It dictates that semantic relationships are strictly linear, invalidating non-linear deep learning architectures.
C.It proves that stop-words carry the highest semantic density, reversing traditional Information Retrieval heuristics.
D.It formalizes the heavy-tailed distribution of language, ensuring that any fixed-size vocabulary will inevitably encounter a high probability mass of rare and unseen Out-Of-Vocabulary (OOV) words.
Correct Answer: It formalizes the heavy-tailed distribution of language, ensuring that any fixed-size vocabulary will inevitably encounter a high probability mass of rare and unseen Out-Of-Vocabulary (OOV) words.
Explanation:
Zipf's law describes a power-law distribution. While a few words (stop-words) are extremely frequent, there is a massive 'long tail' of rare words. This mathematically guarantees that no matter how large the vocabulary, unseen words will continuously appear in test data, forming the basis of the OOV problem.
Incorrect! Try again.
58Consider a long document where the word "excellent" is repeated 100 times. Using standard Term Frequency (TF), the weight is $100$, which can disproportionately dominate the document vector. To dampen this effect, sublinear TF scaling is often applied. Which of the following formulas represents the standard sublinear scaling for a raw term frequency ?
TF-IDF
Hard
A.
B.
C.
D.
Correct Answer:
Explanation:
Sublinear TF scaling is implemented using logarithms to reduce the impact of very high term frequencies. The standard formulation is for terms that occur at least once, scaling a frequency of 100 down to a much more manageable weight (e.g., using base 10).
Incorrect! Try again.
59While Named Entity Recognition (NER) is traditionally framed as a token-level sequence labeling task using architectures like BiLSTM-CRF, modern generative models occasionally reframe NER as a sequence-to-sequence generation task. What is the primary analytical trade-off (drawback) of this generative approach compared to strict token classification?
applications of NLP
Hard
A.Generative models cannot output text sequentially, breaking the auto-regressive properties required for NER.
B.Generative models are unable to identify overlapping or nested entities, which classification models handle natively.
C.Generative models require significantly more annotated data because they cannot utilize pre-trained transformer weights.
D.Generative approaches are highly prone to hallucination, potentially generating entity spans that do not perfectly align with or exist in the exact source tokens.
Correct Answer: Generative approaches are highly prone to hallucination, potentially generating entity spans that do not perfectly align with or exist in the exact source tokens.
Explanation:
Token-classification (like CRF) maps directly over the input tokens, ensuring entity spans strictly correspond to actual text. Generative seq2seq models emit entirely new tokens and can hallucinate words or alter spellings, causing extracted entities to misalign with the original source text.
Incorrect! Try again.
60The perplexity of a test sequence under a bigram language model is defined as . If the un-smoothed language model assigns a probability of $0$ to a single unseen bigram in the test sequence, what mathematically happens to the perplexity, and what does this signify?
n-grams
Hard
A.The perplexity defaults to the size of the vocabulary , indicating maximum entropy.
B.The perplexity collapses to $1$, meaning the model defaults to a uniform distribution.
C.The perplexity becomes infinite (), indicating that the model is entirely incapable of evaluating the test set without appropriate smoothing.
D.The perplexity becomes $0$, signifying that the model perfectly predicts the test sequence.
Correct Answer: The perplexity becomes infinite (), indicating that the model is entirely incapable of evaluating the test set without appropriate smoothing.
Explanation:
If a single bigram has probability $0$, the overall sequence probability becomes $0$. Raising $0$ to a negative power () is equivalent to division by zero, driving the perplexity to infinity. This demonstrates why smoothing is strictly mandatory for -gram models.