1What is the primary goal of Natural Language Processing (NLP)?
A.To execute programming languages faster
B.To enable computers to understand, interpret, and generate human language
C.To encrypt human language for security
D.To store large databases of text
Correct Answer: To enable computers to understand, interpret, and generate human language
Explanation:NLP focuses on the interaction between computers and humans using natural language, aiming to read, decipher, understand, and make sense of human languages in a valuable way.
Incorrect! Try again.
2Which of the following is considered one of the earliest successes in the history of NLP, specifically in Machine Translation?
A.The Turing Test
B.The Georgetown Experiment
C.The ELIZA Chatbot
D.Google Translate
Correct Answer: The Georgetown Experiment
Explanation:The Georgetown experiment in 1954 involved the automatic translation of more than sixty Russian sentences into English and is considered a landmark early event in NLP.
Incorrect! Try again.
3Which component of NLP is responsible for understanding the structure and formation of words?
A.Syntax
B.Pragmatics
C.Morphology
D.Semantics
Correct Answer: Morphology
Explanation:Morphology is the study of the structure of words and how they are formed from morphemes (the smallest grammatical units).
Incorrect! Try again.
4In the context of NLP, what is 'Ambiguity'?
A.The ability to process multiple languages
B.The phenomenon where a sentence or word has more than one possible interpretation
C.The speed at which text is processed
D.The lack of data in a corpus
Correct Answer: The phenomenon where a sentence or word has more than one possible interpretation
Explanation:Ambiguity refers to uncertainty of meaning, where a word, phrase, or sentence can be understood in multiple ways.
Incorrect! Try again.
5Which level of linguistic analysis deals with the arrangement of words to form grammatical sentences?
A.Phonology
B.Morphology
C.Syntax
D.Semantics
Correct Answer: Syntax
Explanation:Syntax concerns the rules and principles that govern the sentence structure of any individual language.
Incorrect! Try again.
6What is the difference between Syntax and Semantics?
A.Syntax is about meaning; Semantics is about structure
B.Syntax is about structure; Semantics is about meaning
C.Both are about sound patterns
D.There is no difference
Correct Answer: Syntax is about structure; Semantics is about meaning
Explanation:Syntax ensures the grammatical correctness of the sentence structure, while semantics ensures the sentence makes logical sense or conveys the intended meaning.
Incorrect! Try again.
7Which type of knowledge involves understanding how sentences are used in different situations and how context affects meaning?
A.Phonetic knowledge
B.Syntactic knowledge
C.Pragmatic knowledge
D.Lexical knowledge
Correct Answer: Pragmatic knowledge
Explanation:Pragmatics studies how context contributes to meaning, looking beyond the literal meaning to the intended meaning in a specific situation.
Incorrect! Try again.
8The sentence 'I saw the man with the telescope' is an example of what type of ambiguity?
A.Lexical Ambiguity
B.Syntactic Ambiguity
C.Phonological Ambiguity
D.Referential Ambiguity
Correct Answer: Syntactic Ambiguity
Explanation:This is syntactic (structural) ambiguity because it is unclear whether 'I' had the telescope or 'the man' had the telescope based on the sentence structure.
Incorrect! Try again.
9What is the smallest unit of meaning in a language?
A.Phoneme
B.Morpheme
C.Token
D.Character
Correct Answer: Morpheme
Explanation:A morpheme is the smallest meaningful unit in a language (e.g., 'un-', 'happy', '-ness').
Incorrect! Try again.
10Which NLP application involves automatically classifying an email as 'Spam' or 'Not Spam'?
A.Machine Translation
B.Text Summarization
C.Text Classification
D.Question Answering
Correct Answer: Text Classification
Explanation:Spam filtering is a classic example of text classification, where a document is assigned to one or more categories.
Incorrect! Try again.
11What is the process of breaking a stream of text into words, phrases, symbols, or other meaningful elements called?
A.Stemming
B.Tokenization
C.Lemmatization
D.Parsing
Correct Answer: Tokenization
Explanation:Tokenization is the process of segmenting text into smaller units called tokens (words, punctuation, etc.).
Incorrect! Try again.
12Which of the following is an example of a 'Stop Word'?
A.Computer
B.The
C.Run
D.Quickly
Correct Answer: The
Explanation:Stop words are high-frequency words (like 'the', 'is', 'at', 'which') that usually contribute little unique meaning to the text and are often removed during processing.
Incorrect! Try again.
13What is the main objective of Stemming?
A.To correct spelling errors
B.To reduce words to their root or base form, often by chopping off the ends
C.To find the dictionary form of a word
D.To identify the part of speech
Correct Answer: To reduce words to their root or base form, often by chopping off the ends
Explanation:Stemming uses heuristic processes (like chopping off suffixes) to reduce words to a base form (stem), which may not be a valid word.
Incorrect! Try again.
14How does Lemmatization differ from Stemming?
A.Lemmatization is faster but less accurate
B.Lemmatization simply chops off suffixes
C.Lemmatization considers the context and converts the word to its meaningful dictionary form (lemma)
D.There is no difference
Correct Answer: Lemmatization considers the context and converts the word to its meaningful dictionary form (lemma)
Explanation:Unlike stemming, lemmatization uses vocabulary and morphological analysis to return the dictionary form of a word (e.g., 'better' to 'good').
Incorrect! Try again.
15If you stem the word 'ponies', the result might be 'poni'. If you lemmatize 'ponies', the result is likely:
A.poni
B.pony
C.ponies
D.po
Correct Answer: pony
Explanation:Lemmatization returns the valid base form (lemma), which is 'pony', whereas stemming might just remove the 'es' or 's' resulting in a non-word.
Incorrect! Try again.
16What does TF-IDF stand for?
A.Term Frequency - Inverse Document Frequency
B.Text Frequency - Index Document Frequency
C.Total Frequency - Internal Data Frequency
D.Term Format - Independent Data Format
Correct Answer: Term Frequency - Inverse Document Frequency
Explanation:TF-IDF stands for Term Frequency-Inverse Document Frequency, a statistical measure used to evaluate the importance of a word in a document.
Incorrect! Try again.
17In TF-IDF, what does 'Term Frequency' (TF) measure?
A.How rare a word is in the entire corpus
B.How frequently a word appears in a specific document
C.The number of documents containing the word
D.The total number of words in the dictionary
Correct Answer: How frequently a word appears in a specific document
Explanation:TF measures the frequency of a word (term) within a single document being analyzed.
Incorrect! Try again.
18In TF-IDF, what is the purpose of the 'Inverse Document Frequency' (IDF) component?
A.To give higher weight to common words like 'the'
B.To diminish the weight of terms that occur very frequently in the document set and increase the weight of terms that occur rarely
C.To count how many times a word appears in a sentence
D.To normalize the length of the document
Correct Answer: To diminish the weight of terms that occur very frequently in the document set and increase the weight of terms that occur rarely
Explanation:IDF helps to down-weight words that appear in many documents (like stop words) and up-weight words that are rare and distinct to specific documents.
Incorrect! Try again.
19What is 'Lexical Ambiguity'?
A.Confusion about the sentence structure
B.Confusion caused by a single word having multiple meanings
C.Confusion about who a pronoun refers to
D.Confusion about the tone of the text
Correct Answer: Confusion caused by a single word having multiple meanings
Explanation:Lexical ambiguity occurs when a single word has more than one meaning (polysemy). E.g., 'Bank' (river bank vs. financial bank).
Incorrect! Try again.
20Which of the following is a challenge in Tokenization?
A.Identifying abbreviations and acronyms (e.g., U.S.A.)
B.Storing the text
C.Displaying the font
D.Calculating TF-IDF
Correct Answer: Identifying abbreviations and acronyms (e.g., U.S.A.)
Explanation:Tokenizers must decide if the period in 'U.S.A.' ends a sentence or is part of the token. This makes punctuation handling a challenge.
Incorrect! Try again.
21What is a 'Corpus' in NLP?
A.A software used for processing text
B.A large, structured set of texts used for statistical analysis
C.The core algorithm of a chatbot
D.A type of syntax error
Correct Answer: A large, structured set of texts used for statistical analysis
Explanation:A corpus (plural: corpora) is a large collection of text or speech data used to train and test NLP models.
Incorrect! Try again.
22Which of the following describes 'Sentiment Analysis'?
A.Translating text from English to French
B.Determining the emotional tone or opinion expressed in a text
C.Summarizing a long article
D.Converting speech to text
Correct Answer: Determining the emotional tone or opinion expressed in a text
Explanation:Sentiment analysis aims to determine the attitude (positive, negative, neutral) of a speaker or writer with respect to a topic.
Incorrect! Try again.
23In the phrase 'unhappiness', what is the root morpheme?
A.un
B.happy
C.ness
D.unhappy
Correct Answer: happy
Explanation:'Happy' is the root (free morpheme), while 'un-' and '-ness' are bound morphemes (affixes).
Incorrect! Try again.
24Which step usually comes FIRST in a standard NLP pipeline?
A.TF-IDF Calculation
B.Tokenization
C.POS Tagging
D.Lemmatization
Correct Answer: Tokenization
Explanation:Text must be broken down into individual units (tokens) before tasks like tagging, stemming, or calculating frequencies can occur.
Incorrect! Try again.
25Why is 'World Knowledge' a challenge in NLP?
A.Computers do not have enough memory
B.Language often relies on common sense and facts about the world that are not explicitly stated in the text
C.Grammar rules are too strict
D.Dictionaries are not large enough
Correct Answer: Language often relies on common sense and facts about the world that are not explicitly stated in the text
Explanation:Humans use vast amounts of background knowledge to interpret text (e.g., knowing that water is wet). Encoding this 'common sense' into machines is difficult.
Incorrect! Try again.
26What is the Porter Stemmer?
A.A tool for syntax analysis
B.A widely used algorithm for suffix stripping (stemming)
C.A method for calculating IDF
D.A database of stop words
Correct Answer: A widely used algorithm for suffix stripping (stemming)
Explanation:The Porter Stemming algorithm is one of the most popular algorithms for reducing English words to their stems.
Incorrect! Try again.
27When might you choose NOT to remove stop words?
A.When analyzing general topic trends
B.When searching for specific phrases like 'to be or not to be'
Correct Answer: When searching for specific phrases like 'to be or not to be'
Explanation:In phrase search or tasks where grammatical structure is crucial (like 'to be'), removing stop words destroys the phrase.
Incorrect! Try again.
28What is 'Referential Ambiguity' usually associated with?
A.Anaphora resolution (pronouns)
B.Word definitions
C.Speech recognition
D.Spelling errors
Correct Answer: Anaphora resolution (pronouns)
Explanation:Referential ambiguity arises when it is unclear which noun a pronoun (like 'he', 'she', 'it') refers to.
Incorrect! Try again.
29Which of the following is a 'Bound Morpheme'?
A.Dog
B.Eat
C.-ing
D.Table
Correct Answer: -ing
Explanation:Bound morphemes (like '-ing', '-ed', 'pre-') cannot stand alone as words; they must be attached to other morphemes.
Incorrect! Try again.
30What is the result of applying a tokenizer to the string 'Hello, world!'?
A.['Hello', 'world']
B.['Hello, world!']
C.['Hello', ',', 'world', '!']
D.['H', 'e', 'l', 'l', 'o']
Correct Answer: ['Hello', ',', 'world', '!']
Explanation:A standard tokenizer separates words and punctuation marks into distinct tokens.
Incorrect! Try again.
31A high TF-IDF score for a word in a specific document indicates:
A.The word is a stop word
B.The word is very common across all documents
C.The word is frequent in that specific document but rare in the overall corpus
D.The word is rare in that document
Correct Answer: The word is frequent in that specific document but rare in the overall corpus
Explanation:High TF means it appears often in the doc; High IDF means it is rare elsewhere. Together, they signify a keyword unique/important to that document.
Incorrect! Try again.
32Which area of NLP deals with the sound units of language?
A.Phonology
B.Morphology
C.Syntax
D.Pragmatics
Correct Answer: Phonology
Explanation:Phonology is the study of the system of sounds in a language.
Incorrect! Try again.
33Which of the following best describes 'Compositional Semantics'?
A.The meaning of the whole is determined by the meanings of the parts and how they are assembled
B.The meaning is determined solely by the length of the sentence
C.The meaning is random
D.The meaning is determined by the tone of voice
Correct Answer: The meaning of the whole is determined by the meanings of the parts and how they are assembled
Explanation:Compositional semantics assumes that the meaning of a complex expression is a function of the meanings of its constituent parts.
Incorrect! Try again.
34In the context of text processing, what does 'Case Folding' refer to?
A.Folding the paper the text is printed on
B.Converting all characters to the same case (usually lowercase)
C.Removing punctuation
D.Identifying proper nouns
Correct Answer: Converting all characters to the same case (usually lowercase)
Explanation:Case folding (lowercasing) is a common preprocessing step to ensure 'Apple' and 'apple' are treated as the same token.
Incorrect! Try again.
35Which of the following is NOT a typical application of NLP?
A.Stock Market Prediction using news headlines
B.Image Compression
C.Virtual Assistants (Siri, Alexa)
D.Grammar Checkers
Correct Answer: Image Compression
Explanation:Image compression deals with visual data (pixels), not natural language text or speech.
Incorrect! Try again.
36What is a 'Bag of Words' (BoW) model?
A.A physical bag containing dictionaries
B.A representation of text that describes the occurrence of words but ignores order and grammar
C.A sophisticated syntactic parser
D.A list of stop words
Correct Answer: A representation of text that describes the occurrence of words but ignores order and grammar
Explanation:BoW represents text as a collection (bag) of its words, disregarding grammar and word order but keeping multiplicity.
Incorrect! Try again.
37Why is 'Slang' a challenge for NLP?
A.It is too formal
B.It changes rapidly and may not appear in standard dictionaries
C.It uses too many vowels
D.It is always in uppercase
Correct Answer: It changes rapidly and may not appear in standard dictionaries
Explanation:Slang creates lexical ambiguity and out-of-vocabulary issues because it evolves faster than static language models or dictionaries can be updated.
Incorrect! Try again.
38In the sentence 'Time flies like an arrow', the word 'flies' could be a verb or a noun. This is:
A.Phonetic Ambiguity
B.Part-of-Speech (POS) Ambiguity
C.Pragmatic Ambiguity
D.Stop word Ambiguity
Correct Answer: Part-of-Speech (POS) Ambiguity
Explanation:The word functions as different parts of speech (verb vs noun) in different interpretations, creating ambiguity.
Incorrect! Try again.
39If a corpus has 1000 documents and the word 'biology' appears in 1000 of them, what is likely true about its IDF value?
A.It will be very high
B.It will be zero or very close to zero
C.It will be 1000
D.It cannot be calculated
Correct Answer: It will be zero or very close to zero
Explanation:IDF = log(N/df). If N=1000 and df=1000, log(1) = 0. The word provides no distinguishing power.
Incorrect! Try again.
40Which processing technique requires Part-of-Speech (POS) tagging to be effective?
A.Simple Tokenization
B.Lemmatization
C.Lowercasing
D.Stop word removal
Correct Answer: Lemmatization
Explanation:To correctly lemmatize 'saw' (is it the verb 'see' or the noun tool?), the algorithm needs to know the Part of Speech.
Incorrect! Try again.
41Which of the following is considered a 'Free Morpheme'?
A.re-
B.town
C.-ly
D.-ed
Correct Answer: town
Explanation:A free morpheme can stand alone as a word ('town'). The others are bound morphemes (affixes).
Incorrect! Try again.
42The Turing Test was proposed by Alan Turing to determine:
A.The speed of a computer
B.If a machine can exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human
C.The memory capacity of a hard drive
D.The accuracy of machine translation
Correct Answer: If a machine can exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human
Explanation:The Turing Test (1950) is a measure of a machine's ability to demonstrate human-like intelligence via conversation.
Incorrect! Try again.
43What is 'Discourse Analysis'?
A.Analyzing single words in isolation
B.Analyzing language use beyond the sentence boundary
C.Converting audio to text
D.Sorting words alphabetically
Correct Answer: Analyzing language use beyond the sentence boundary
Explanation:Discourse analysis looks at larger chunks of text (conversations, paragraphs) to understand flow, coherence, and references.
Incorrect! Try again.
44In the TF-IDF formula, if term 't' appears 5 times in a document of 100 words, the Normalized TF is:
A.5
B.0.05
C.500
D.95
Correct Answer: 0.05
Explanation:Normalized TF is often calculated as (Count of term / Total words in doc) = 5/100 = 0.05.
Incorrect! Try again.
45What is the primary motivation for 'Text Normalization' (like stemming/lemmatization) in search engines?
A.To make the text look pretty
B.To match a user's query (e.g., 'running') with documents containing related forms (e.g., 'run')
C.To remove all verbs
D.To translate the query
Correct Answer: To match a user's query (e.g., 'running') with documents containing related forms (e.g., 'run')
Explanation:Normalization conflates variations of words to a single form so that searches cover all variations of the concept.
Incorrect! Try again.
46Which linguistic field studies how words combine to form phrases and sentences?
A.Syntax
B.Phonetics
C.Morphology
D.Semantics
Correct Answer: Syntax
Explanation:Syntax focuses on the grammatical structure and ordering of words.
Incorrect! Try again.
47When tokenizing text from social media (e.g., Twitter/X), what is a specific challenge?
A.Handling hashtags (#) and mentions (@)
B.The text is too long
C.There are no vowels
D.It is always formal English
Correct Answer: Handling hashtags (#) and mentions (@)
Explanation:Standard tokenizers might split '#NLP' into '#' and 'NLP', but in social media analysis, '#NLP' should often remain a single token.
Incorrect! Try again.
48What defines a 'Regular Language' in the context of automata and NLP?
A.A language that can be described by a regular expression
B.A language with no slang
C.A language spoken by humans
D.A programming language
Correct Answer: A language that can be described by a regular expression
Explanation:In Chomsky's hierarchy, regular languages are the simplest, recognizable by finite automata and describable by regular expressions.
Incorrect! Try again.
49Which of the following sentences best illustrates 'Semantic Ambiguity' (not Syntactic)?
A.I saw the man with the telescope.
B.The bank was closed.
C.Flying planes can be dangerous.
D.Visiting relatives can be boring.
Correct Answer: The bank was closed.
Explanation:This relies on the meaning of the word 'bank' (financial vs river), whereas the others rely on the structural parsing of the sentence.
Incorrect! Try again.
50What is the relationship between AI and NLP?
A.They are completely unrelated fields
B.NLP is a subfield of AI focused on language
C.AI is a subfield of NLP
D.NLP replaces AI
Correct Answer: NLP is a subfield of AI focused on language
Explanation:NLP is a multidisciplinary field at the intersection of computer science, linguistics, and artificial intelligence.
Incorrect! Try again.
Give Feedback
Help us improve by sharing your thoughts or reporting issues.