1 $Which decade is generally considered the starting point of Natural Language Processing, marked by Alan Turing's publication on machine intelligence?$

origin of NLP Easy

A.

1930s

B.

1970s

C.

1950s

D.

1990s

2 $What defines the set of structural rules governing the composition of clauses, phrases, and words in any given natural language?$

language and grammar Easy

A.

Semantics

B.

Pragmatics

C.

Phonology

D.

Grammar

3 $What is defined as the study of the internal structure of words and how they are formed?$

morphology Easy

A.

Semantics

B.

Phonetics

C.

Morphology

D.

Syntax

4 $Which level of linguistic analysis focuses specifically on the order of words and how they combine to form sentences?$

syntax Easy

A.

Phonology

B.

Pragmatics

C.

Syntax

D.

Morphology

5 $What is the primary focus of semantics in Natural Language Processing?$

semantics Easy

A.

The literal meaning of words and sentences

B.

The historical origin of languages

C.

The structural rules of sentences

D.

The pronunciation of words

6 $Which of the following describes 'ambiguity' in Natural Language Processing?$

challenges of NLP Easy

A.

A word that is misspelled

B.

A sentence written in multiple languages

C.

A sentence that lacks punctuation

D.

A sentence or word having multiple possible meanings

7 $Which of the following is a common application of NLP?$

applications of NLP Easy

A.

Network routing

B.

Image classification

C.

Machine Translation

D.

Database management

8 $What is the process of breaking down a continuous stream of text into smaller units like words or sentences?$

tokenization Easy

A.

Normalization

B.

Stemming

C.

Lemmatization

D.

Tokenization

9 $What is the primary purpose of stemming in text processing?$

stemming Easy

A.

To check the spelling of a word

B.

To translate a word into another language

C.

To chop off affixes to reduce a word to its root form

D.

To find synonyms of a word

10 $How does lemmatization primarily differ from stemming?$

lemmatization Easy

A.

Lemmatization removes all vowels from a word.

B.

Lemmatization is faster than stemming.

C.

Lemmatization only works on numbers.

D.

Lemmatization uses a vocabulary and morphological analysis to return a valid dictionary word.

11 $What are stop-words in the context of NLP preprocessing?$

stop-word removal Easy

A.

Words that indicate the end of a sentence

B.

Highly frequent words that carry very little semantic meaning, like 'the' and 'is'

C.

Words that are not found in the dictionary

D.

Words that carry the most important meaning in a text

12 $Why is punctuation handling an important step in text preprocessing?$

punctuation handling Easy

A.

To ensure grammatical correctness for the end user

B.

To add missing commas to sentences

C.

To translate text accurately

D.

To prevent punctuation marks from being incorrectly attached to words as part of the token

13 $What does OOV stand for in Natural Language Processing?$

handling out-of-vocabulary words Easy

A.

Over-Optimized-Vector

B.

Output-Observation-Value

C.

Out-Of-Vocabulary

D.

Object-Oriented-Verb

14 $Which of the following is a classic example of text normalization?$

normalization Easy

A.

Converting all text to lowercase

B.

Extracting named entities

C.

Translating text to English

D.

Generating a summary of a document

15 $What major linguistic feature does the basic Bag-of-Words (BoW) model ignore?$

Bag-of-Words Easy

A.

Vocabulary size

B.

Word occurrence

C.

Word order and context

D.

Word frequency

16 $In the context of n-grams, what is a 'bigram'?$

n-grams Easy

A.

A sentence with two clauses

B.

A word with two syllables

C.

A document with two paragraphs

D.

A sequence of two consecutive words

17 $What does 'TF' stand for in the TF-IDF representation?$

TF-IDF Easy

A.

Term Frequency

B.

Total Frequency

C.

Token Format

D.

Text Feature

18 $What is the main purpose of the Inverse Document Frequency (IDF) component in TF-IDF?$

TF-IDF Easy

A.

To count the total number of documents

B.

To penalize words that are rare in the corpus

C.

To increase the weight of stop-words

D.

To reduce the weight of words that appear in many documents across the corpus

19 $What is a 'phoneme' in linguistics?$

linguistic essentials Easy

A.

The smallest structural unit that carries meaning

B.

The smallest unit of sound that distinguishes one word from another

C.

A dictionary of words

D.

The rules of sentence construction

20 $Which famous 1954 experiment provided a highly publicized demonstration of machine translation, translating Russian sentences into English?$

origin of NLP Easy

A.

The Turing Experiment

B.

The Georgetown-IBM experiment

C.

The ELIZA project

D.

The SHRDLU system

21 $Early NLP systems like ELIZA (1966) simulated conversation using pattern matching. What was the primary limitation of this approach regarding natural language understanding?$

origin of NLP Medium

A.

It possessed no actual understanding of syntactic structure or semantic meaning.

B.

It could only process mathematical equations rather than human language.

C.

It required massive computational power to run the neural networks.

D.

It required the user to input text in binary code.

22 $If an NLP system uses Context-Free Grammar (CFG) to parse sentences, which of the following phenomena is most challenging for it to handle efficiently?$

language and grammar Medium

A.

Recognizing terminal symbols (words) in a lexicon.

B.

Subject-verb agreement within a simple sentence.

C.

Generating a standard syntax tree for a declarative sentence.

D.

Context-sensitive dependencies, such as cross-serial dependencies.

23 $Consider the sentence: 'Can you pass the salt?' While syntactically a question, it functions as a request. Which linguistic level of analysis is required to understand this intended meaning?$

linguistic essentials Medium

A.

Phonology

B.

Pragmatics

C.

Syntax

D.

Morphology

24 $Analyze the words 'running' and 'runner'. Which of the following statements correctly identifies the morphological processes applied to the root word 'run'?$

morphology Medium

A.

Both are examples of derivational morphology.

B.

Both are examples of inflectional morphology.

C.

'Running' uses derivational morphology, while 'runner' uses inflectional morphology.

D.

'Running' uses inflectional morphology, while 'runner' uses derivational morphology.

25 $The sentence 'I saw the man with the telescope' exhibits structural ambiguity. Which NLP task is directly responsible for resolving how 'with the telescope' attaches to the rest of the sentence?$

syntax Medium

A.

Part-of-Speech tagging

B.

Named Entity Recognition

C.

Dependency parsing

D.

Morphological segmentation

26 $When building a word sense disambiguation model, you encounter the word 'bank' in the contexts of 'river bank' and 'bank account'. This is an example of dealing with which semantic concept?$

semantics Medium

A.

Polysemy/Homonymy

B.

Synonymy

C.

Hyponymy

D.

Antonymy

27 $In the sentence 'John told his father that he was promoted', the pronoun 'he' is ambiguous. Resolving whether 'he' refers to John or his father is an example of which NLP challenge?$

challenges of NLP Medium

A.

Speech segmentation

B.

Syntactic ambiguity

C.

Coreference resolution

D.

Lexical ambiguity

28 $If you are designing an NLP pipeline for a conversational AI (chatbot) meant to book flights, which sequence of applications is most logical for processing user input?$

applications of NLP Medium

A.

Intent Recognition -> Named Entity Recognition -> Dialog Management

B.

Topic Modeling -> Word Sense Disambiguation -> Stemming

C.

Sentiment Analysis -> Optical Character Recognition -> POS Tagging

D.

Text Summarization -> Machine Translation -> Speech-to-Text

29 $Using Byte Pair Encoding (BPE) for tokenization helps mitigate which specific problem compared to standard word-level tokenization?$

tokenization Medium

A.

It effectively manages out-of-vocabulary (OOV) and rare words by breaking them into known subwords.

B.

It removes the need for stop-word removal.

C.

It automatically corrects spelling mistakes in the source text.

D.

It prevents the creation of syntactic ambiguity.

30 $An algorithm reduces both 'university' and 'universal' to the stem 'univers'. What type of stemming error has occurred, and why is it problematic for downstream tasks?$

stemming Medium

A.

Normalization error; the algorithm failed to lowercase the tokens.

B.

Lemmatization failure; the algorithm failed to find the dictionary root.

C.

Over-stemming; words with different meanings are conflated into the same token.

D.

Under-stemming; the words will not be matched despite having the same meaning.

31 $Unlike a standard stemmer, a lemmatizer typically requires which piece of additional information to accurately reduce the word 'saw' to its root form?$

lemmatization Medium

A.

The character-level embeddings of the word.

B.

The sentiment score of the surrounding sentence.

C.

The Part-of-Speech (POS) tag of the word in its context.

D.

The document frequency of the word.

32 $In which of the following NLP tasks is aggressive stop-word removal most likely to degrade model performance?$

stop-word removal Medium

A.

Spam email detection using Bag-of-Words.

B.

Topic modeling (e.g., LDA).

C.

Machine Translation (e.g., translating English to French).

D.

Document clustering based on topic.

33 $When designing a regular expression-based tokenizer, dealing with punctuation can be tricky. Which of the following examples best demonstrates why simply splitting text on all punctuation characters is a poor strategy?$

punctuation handling Medium

A.

'Data science (ML)' becomes 'Data', 'science', 'ML'.

B.

'Hello, world!' becomes 'Hello' and 'world'.

C.

'The end.' becomes 'The', 'end'.

D.

'It's a state-of-the-art model.' becomes 'It', 's', 'a', 'state', 'of', 'the', 'art', 'model'.

34 $A model trained on a specific corpus encounters a new word during testing. If the model uses a fixed word-level vocabulary with a single <UNK> token, how does it process the new word?$

handling out-of-vocabulary words Medium

A.

It ignores the word entirely to prevent matrix dimension errors.

B.

It maps the word to the <UNK> token, treating all unknown words identically.

C.

It dynamically expands its embedding matrix by adding a new randomized vector.

D.

It uses character-level CNNs to infer the word's exact meaning.

35 $Which of the following scenarios demonstrates a potential negative consequence of applying universal case folding (lowercasing all text) during text normalization?$

normalization Medium

A.

Punctuation is accidentally removed during the lowercasing process.

B.

'RUN' and 'run' are mapped to the same token, aiding general search.

C.

'apple' (fruit) and 'Apple' (company) are mapped to the same token, losing distinction.

D.

The size of the vocabulary is drastically increased, causing memory issues.

36 $Given a vocabulary of size, you represent a 50-word sentence using a standard Bag-of-Words (BoW) vector. What is the dimension of the resulting vector, and what is its primary characteristic?$

Bag-of-Words Medium

A.

Dimension is 10,000; the vector is highly sparse.

B.

Dimension is 10,000; the vector is dense.

C.

Dimension is 50; the vector is dense.

D.

Dimension is 50; the vector is highly sparse.

37 $Consider a sentence containing exactly words. If we extract standard continuous trigrams (n=3) from this sentence, how many trigram tokens will be generated? (Assume)$

n-grams Medium

A.

B.

C.

D.

38 $In the TF-IDF formulation, the Inverse Document Frequency (IDF) of a term is often calculated as . If a specific stop-word appears in every single document in a corpus of size, what will its IDF value be?$

TF-IDF Medium

A.

B.

C.

$1$

D.

$0$

39 $Suppose document A has 100 words, and the word 'AI' appears 5 times. Document B has 500 words, and 'AI' appears 15 times. Using basic Term Frequency (count divided by document length), which document gives higher TF to the word 'AI'?$

TF-IDF Medium

A.

Document A, because its TF is $0.05$, compared to Document B's TF of $0.03$.

B.

They have the same TF because IDF balances the counts.

C.

Document B, because its TF is $15$, compared to Document A's TF of $5$.

D.

Document B, because its TF is $0.03$, compared to Document A's TF of $0.05$.

40 $When training an n-gram language model, sparsity becomes a major issue for higher values of . Which technique is commonly applied to handle -grams that appear in the test set but were never seen in the training set?$

n-grams Medium

A.

Stemming and Lemmatization

B.

Term Frequency-Inverse Document Frequency (TF-IDF) weighting

C.

Laplace (Add-1) Smoothing or Backoff

D.

Stop-word removal

41 $Which of the following best characterizes the primary historical consequence of the 1966 ALPAC (Automatic Language Processing Advisory Committee) report on the early trajectory of Natural Language Processing?$

origin of NLP Hard

A.

It catalyzed the immediate development of the first conversational agents, such as ELIZA, to bypass the complexities of syntactic translation.

B.

It accelerated the shift from rule-based systems to deep learning architectures by highlighting the computational limits of the era.

C.

It established the mathematical foundation for the Chomsky hierarchy, permanently linking computer science formalisms to linguistic theory.

D.

It caused a drastic reduction in funding for machine translation research by concluding that fully automatic high-quality translation was unattainable in the near future.

42 $Natural languages exhibit phenomena such as center embedding (e.g., "The mouse the cat the dog bit chased ran away"). In the context of the Chomsky hierarchy, what does the theoretical existence of arbitrary-depth center embedding imply about human language?$

language and grammar Hard

A.

It proves that natural languages are uncomputable and cannot be parsed by Turing machines.

B.

It demonstrates that natural language morphologies are inherently regular, as they rely on linear concatenation.

C.

It implies that natural languages are strictly finite-state and can be parsed using simple deterministic automata.

D.

It demonstrates that natural languages cannot be completely modeled by Regular Grammars and require at least Context-Free Grammars.

43 $Consider the morphological transformation from the root word "compute" to the word "computational". Which sequence of morphological processes accurately characterizes this transition?$

morphology Hard

A.

A series of purely inflectional affixes that adapt the word to different syntactic roles without altering its core semantic meaning.

B.

An agglutinative stacking of inflectional morphemes representing tense, case, and gender simultaneously.

C.

A combination of compounding and cliticization, merging independent morphemes to form a complex nominal modifier.

D.

Multiple derivational affixes applied sequentially, changing both the semantic meaning and the syntactic category (verb noun adjective).

44 $The sentence "I saw the man with the telescope" exhibits severe structural ambiguity. In a Probabilistic Context-Free Grammar (PCFG), how is the correct parse tree mathematically resolved during syntactic inference?$

syntax Hard

A.

By selecting the parse tree that minimizes the cross-entropy loss of the terminal leaves.

B.

By calculating the longest common subsequence of non-terminal symbols mapped to the training corpus.

C.

By applying deterministic shift-reduce parsing and prioritizing shift operations over reduce operations unconditionally.

D.

By computing the parse tree that maximizes the joint probability of the structural derivations given the lexical observations, often using the CYK algorithm.

45 $The Principle of Compositionality asserts that the meaning of a complex expression is determined by the meanings of its constituent expressions and the rules used to combine them. Which of the following linguistic phenomena breaks this principle, requiring specialized representation in vector space models?$

semantics Hard

A.

Non-compositional multi-word expressions (idioms), where the overarching semantic meaning is decoupled from the literal constituent semantics.

B.

Synonymy, where two structurally different expressions yield the exact same semantic representation.

C.

Morphological derivations that transition words across parts-of-speech.

D.

Polysemous words, where a single word has multiple related meanings depending on its context.

46 $The Winograd Schema Challenge (e.g., "The trophy didn't fit into the brown suitcase because it was too large.") is designed to test machine intelligence. What specific NLP challenge does this schema fundamentally target?$

challenges of NLP Hard

A.

Syntactic parsing of highly ambiguous prepositional phrase attachments.

B.

Coreference resolution that strictly requires world knowledge and common-sense reasoning.

C.

Lexical disambiguation of homophones in unconstrained text.

D.

Morphological segmentation of agglutinative languages.

47 $When applying Byte-Pair Encoding (BPE) tokenization to a highly agglutinative language like Turkish or Finnish, what is the primary mathematical and practical advantage over strict word-level tokenization in deep learning models?$

tokenization Hard

A.

It mitigates the sparse vocabulary problem by decomposing rare morphological variations into statistically frequent, reusable subword units.

B.

It strictly enforces the Chomsky normal form, ensuring all tokens can be parsed into binary trees.

C.

It prevents the vanishing gradient problem by maintaining constant sequence lengths across all batches.

D.

It eliminates the need for positional embeddings, as subwords implicitly encode sequence order.

48 $Which of the following scenarios is a canonical example of "over-stemming" (a false positive reduction) resulting from the heuristic rules of algorithms like the Porter Stemmer?$

stemming Hard

A.

Failing to map "ran" and "run" to the same linguistic root.

B.

Mapping the plural "matrices" to the singular stem "matrix".

C.

Mapping the words "computation" and "computing" to the single stem "comput".

D.

Mapping the words "universe" and "university" to the single stem "univers".

49 $Unlike algorithmic stemming, dictionary-based lemmatization relies heavily on accurate Part-of-Speech (POS) tagging. Given a standard WordNet Lemmatizer, what is the expected outcome if the word "saw" in the sentence "He used a saw" is incorrectly tagged as a verb prior to lemmatization?$

lemmatization Hard

A.

The lemmatizer will maintain "saw" as it relies on contextual embeddings to override faulty POS tags.

B.

The lemmatizer will map "saw" to "see", thereby corrupting the downstream semantic representation of the tool.

C.

The lemmatizer will output "saws", applying a default pluralization rule for unknown verb forms.

D.

The lemmatizer will raise a runtime exception due to a mismatch between lexical semantics and morphological syntax.

50 $While traditionally beneficial for Information Retrieval, aggressive stop-word removal severely degrades performance in certain modern NLP tasks. In which of the following tasks would stripping stop-words (e.g., "the", "is", "not") be most disastrous for model inference, and why?$

stop-word removal Hard

A.

Topic Modeling (e.g., LDA), because topics are defined exclusively by grammatical function words.

B.

Extractive Text Summarization, because sentence scoring algorithms like TextRank strictly require stop-words to compute node degrees.

C.

Sentiment Analysis involving double negatives, because removing function words obliterates the compositionality of negation context.

D.

Named Entity Recognition, because entities are exclusively comprised of stop-word sequences.

51 $In clinical NLP, uniformly stripping all punctuation can lead to catastrophic semantic loss. Which of the following exemplifies a structural edge case where a punctuation mark functions as a critical semantic token rather than merely a syntactic boundary?$

punctuation handling Hard

A.

The use of periods in identifying specific ICD-10 medical codes (e.g., J45.909).

B.

The use of semicolons to link two independent clauses in patient notes.

C.

The use of commas to separate elements in an appositive clause.

D.

The use of hyphens in compounding standard English adjectives like "state-of-the-art".

52 $The FastText model fundamentally improves upon the standard Word2Vec architecture primarily in its handling of Out-Of-Vocabulary (OOV) words. What is the mathematical mechanism by which FastText accomplishes this?$

handling out-of-vocabulary words Hard

A.

By dynamically querying external knowledge graphs via a differentiable lookup table to construct embeddings on the fly.

B.

By mapping all OOV tokens to an average of the entire vocabulary matrix, ensuring a non-zero initialization.

C.

By representing target words as the sum of the learned vectors of their constituent character -grams, enabling vector approximation for unseen words.

D.

By replacing the softmax layer with hierarchical softmax, which forces all possible character permutations into a binary tree.

53 $When preprocessing noisy multilingual text, what is the specific role of Unicode Normalization Form KC (NFKC) compared to NFC, and why is this critical for NLP normalization?$

normalization Hard

A.

NFKC converts all text to lowercase and strips diacritics, which NFC preserves.

B.

NFKC applies compatibility decomposition followed by canonical composition, merging visually or functionally identical but uniquely encoded characters (e.g., ligatures and full-width forms), thus reducing vocabulary sparsity.

C.

NFKC reverses the byte-order of characters for right-to-left languages, standardizing all inputs to a left-to-right processing pipeline.

D.

NFKC strictly converts multi-byte characters to single-byte ASCII equivalents, preventing encoding errors.

54 $Given two documents represented as standard Bag-of-Words (BoW) frequency vectors and, the raw dot product inherently scales with document length, skewing similarity metrics. Which normalization mathematically transforms these vectors such that their dot product strictly equals their angular cosine similarity?$

Bag-of-Words Hard

A.

Min-Max scaling to the range .

B.

L2 Normalization: and .

C.

L1 Normalization: and .

D.

Z-score standardization across the vocabulary dimensions.

55 $In an -gram language model with a vocabulary size, the transition from bigrams to trigrams drastically exacerbates data sparsity. Which advanced smoothing technique addresses this by utilizing absolute discounting combined with an interpolation weight that falls back to lower-order models based on the diversity of contexts a word appears in?$

n-grams Hard

A.

Witten-Bell Interpolation

B.

Laplace (Add-One) Smoothing

C.

Kneser-Ney Smoothing

D.

Good-Turing Discounting

56 $In the standard TF-IDF formulation, Inverse Document Frequency is often calculated as . If a term appears in every single document in a training corpus of size, what is the mathematical consequence on its TF-IDF weight, and how is this edge case typically mitigated in libraries like scikit-learn?$

TF-IDF Hard

A.

The IDF becomes $0$, zeroing out the entire TF-IDF weight; mitigated by adding a constant $1$ to the resulting IDF score: .

B.

The IDF approaches negative infinity; mitigated by applying a ReLU activation function to clamp negative values.

C.

The IDF becomes undefined (division by zero); mitigated by using .

D.

The IDF becomes $1$, leaving the TF unchanged; no mitigation is required as this is the intended mathematical behavior.

57 $Zipf's Law states that the frequency of a word in a natural language corpus is inversely proportional to its rank, mathematically expressed as . In the context of NLP model training, what critical structural challenge does this empirical distribution formalize?$

linguistic essentials Hard

A.

It establishes an upper bound on the maximum sequence length a Recurrent Neural Network can process without suffering from vanishing gradients.

B.

It formalizes the heavy-tailed distribution of language, ensuring that any fixed-size vocabulary will inevitably encounter a high probability mass of rare and unseen Out-Of-Vocabulary (OOV) words.

C.

It dictates that semantic relationships are strictly linear, invalidating non-linear deep learning architectures.

D.

It proves that stop-words carry the highest semantic density, reversing traditional Information Retrieval heuristics.

58 $Consider a long document where the word "excellent" is repeated 100 times. Using standard Term Frequency (TF), the weight is $100$, which can disproportionately dominate the document vector. To dampen this effect, sublinear TF scaling is often applied. Which of the following formulas represents the standard sublinear scaling for a raw term frequency ?$

TF-IDF Hard

A.

B.

C.

D.

59 $While Named Entity Recognition (NER) is traditionally framed as a token-level sequence labeling task using architectures like BiLSTM-CRF, modern generative models occasionally reframe NER as a sequence-to-sequence generation task. What is the primary analytical trade-off (drawback) of this generative approach compared to strict token classification?$

applications of NLP Hard

A.

Generative models cannot output text sequentially, breaking the auto-regressive properties required for NER.

B.

Generative models are unable to identify overlapping or nested entities, which classification models handle natively.

C.

Generative models require significantly more annotated data because they cannot utilize pre-trained transformer weights.

D.

Generative approaches are highly prone to hallucination, potentially generating entity spans that do not perfectly align with or exist in the exact source tokens.

60 $The perplexity of a test sequence under a bigram language model is defined as . If the un-smoothed language model assigns a probability of $0$ to a single unseen bigram in the test sequence, what mathematically happens to the perplexity, and what does this signify?$

n-grams Hard

A.

The perplexity collapses to $1$, meaning the model defaults to a uniform distribution.

B.

The perplexity becomes $0$, signifying that the model perfectly predicts the test sequence.

C.

The perplexity defaults to the size of the vocabulary, indicating maximum entropy.

D.

The perplexity becomes infinite (), indicating that the model is entirely incapable of evaluating the test set without appropriate smoothing.

Unit 1 - Practice Quiz