1What is the primary goal of Natural Language Processing (NLP)?
A.To execute programming languages faster
B.To enable computers to understand, interpret, and generate human language
C.To encrypt human language for security
D.To store large databases of text
Correct Answer: To enable computers to understand, interpret, and generate human language
Explanation:
NLP focuses on the interaction between computers and humans using natural language, aiming to read, decipher, understand, and make sense of human languages in a valuable way.
Incorrect! Try again.
2Which of the following is considered one of the earliest successes in the history of NLP, specifically in Machine Translation?
A.The Georgetown Experiment
B.Google Translate
C.The ELIZA Chatbot
D.The Turing Test
Correct Answer: The Georgetown Experiment
Explanation:
The Georgetown experiment in 1954 involved the automatic translation of more than sixty Russian sentences into English and is considered a landmark early event in NLP.
Incorrect! Try again.
3Which component of NLP is responsible for understanding the structure and formation of words?
A.Morphology
B.Syntax
C.Semantics
D.Pragmatics
Correct Answer: Morphology
Explanation:
Morphology is the study of the structure of words and how they are formed from morphemes (the smallest grammatical units).
Incorrect! Try again.
4In the context of NLP, what is 'Ambiguity'?
A.The speed at which text is processed
B.The lack of data in a corpus
C.The phenomenon where a sentence or word has more than one possible interpretation
D.The ability to process multiple languages
Correct Answer: The phenomenon where a sentence or word has more than one possible interpretation
Explanation:
Ambiguity refers to uncertainty of meaning, where a word, phrase, or sentence can be understood in multiple ways.
Incorrect! Try again.
5Which level of linguistic analysis deals with the arrangement of words to form grammatical sentences?
A.Phonology
B.Morphology
C.Syntax
D.Semantics
Correct Answer: Syntax
Explanation:
Syntax concerns the rules and principles that govern the sentence structure of any individual language.
Incorrect! Try again.
6What is the difference between Syntax and Semantics?
A.There is no difference
B.Syntax is about meaning; Semantics is about structure
C.Syntax is about structure; Semantics is about meaning
D.Both are about sound patterns
Correct Answer: Syntax is about structure; Semantics is about meaning
Explanation:
Syntax ensures the grammatical correctness of the sentence structure, while semantics ensures the sentence makes logical sense or conveys the intended meaning.
Incorrect! Try again.
7Which type of knowledge involves understanding how sentences are used in different situations and how context affects meaning?
A.Syntactic knowledge
B.Phonetic knowledge
C.Lexical knowledge
D.Pragmatic knowledge
Correct Answer: Pragmatic knowledge
Explanation:
Pragmatics studies how context contributes to meaning, looking beyond the literal meaning to the intended meaning in a specific situation.
Incorrect! Try again.
8The sentence 'I saw the man with the telescope' is an example of what type of ambiguity?
A.Phonological Ambiguity
B.Syntactic Ambiguity
C.Lexical Ambiguity
D.Referential Ambiguity
Correct Answer: Syntactic Ambiguity
Explanation:
This is syntactic (structural) ambiguity because it is unclear whether 'I' had the telescope or 'the man' had the telescope based on the sentence structure.
Incorrect! Try again.
9What is the smallest unit of meaning in a language?
A.Token
B.Morpheme
C.Phoneme
D.Character
Correct Answer: Morpheme
Explanation:
A morpheme is the smallest meaningful unit in a language (e.g., 'un-', 'happy', '-ness').
Incorrect! Try again.
10Which NLP application involves automatically classifying an email as 'Spam' or 'Not Spam'?
A.Text Summarization
B.Question Answering
C.Machine Translation
D.Text Classification
Correct Answer: Text Classification
Explanation:
Spam filtering is a classic example of text classification, where a document is assigned to one or more categories.
Incorrect! Try again.
11What is the process of breaking a stream of text into words, phrases, symbols, or other meaningful elements called?
A.Parsing
B.Tokenization
C.Lemmatization
D.Stemming
Correct Answer: Tokenization
Explanation:
Tokenization is the process of segmenting text into smaller units called tokens (words, punctuation, etc.).
Incorrect! Try again.
12Which of the following is an example of a 'Stop Word'?
A.Quickly
B.Computer
C.The
D.Run
Correct Answer: The
Explanation:
Stop words are high-frequency words (like 'the', 'is', 'at', 'which') that usually contribute little unique meaning to the text and are often removed during processing.
Incorrect! Try again.
13What is the main objective of Stemming?
A.To reduce words to their root or base form, often by chopping off the ends
B.To identify the part of speech
C.To find the dictionary form of a word
D.To correct spelling errors
Correct Answer: To reduce words to their root or base form, often by chopping off the ends
Explanation:
Stemming uses heuristic processes (like chopping off suffixes) to reduce words to a base form (stem), which may not be a valid word.
Incorrect! Try again.
14How does Lemmatization differ from Stemming?
A.Lemmatization simply chops off suffixes
B.Lemmatization considers the context and converts the word to its meaningful dictionary form (lemma)
C.Lemmatization is faster but less accurate
D.There is no difference
Correct Answer: Lemmatization considers the context and converts the word to its meaningful dictionary form (lemma)
Explanation:
Unlike stemming, lemmatization uses vocabulary and morphological analysis to return the dictionary form of a word (e.g., 'better' to 'good').
Incorrect! Try again.
15If you stem the word 'ponies', the result might be 'poni'. If you lemmatize 'ponies', the result is likely:
A.pony
B.po
C.ponies
D.poni
Correct Answer: pony
Explanation:
Lemmatization returns the valid base form (lemma), which is 'pony', whereas stemming might just remove the 'es' or 's' resulting in a non-word.
Incorrect! Try again.
16What does TF-IDF stand for?
A.Term Frequency - Inverse Document Frequency
B.Term Format - Independent Data Format
C.Total Frequency - Internal Data Frequency
D.Text Frequency - Index Document Frequency
Correct Answer: Term Frequency - Inverse Document Frequency
Explanation:
TF-IDF stands for Term Frequency-Inverse Document Frequency, a statistical measure used to evaluate the importance of a word in a document.
Incorrect! Try again.
17In TF-IDF, what does 'Term Frequency' (TF) measure?
A.The number of documents containing the word
B.How frequently a word appears in a specific document
C.The total number of words in the dictionary
D.How rare a word is in the entire corpus
Correct Answer: How frequently a word appears in a specific document
Explanation:
TF measures the frequency of a word (term) within a single document being analyzed.
Incorrect! Try again.
18In TF-IDF, what is the purpose of the 'Inverse Document Frequency' (IDF) component?
A.To diminish the weight of terms that occur very frequently in the document set and increase the weight of terms that occur rarely
B.To count how many times a word appears in a sentence
C.To give higher weight to common words like 'the'
D.To normalize the length of the document
Correct Answer: To diminish the weight of terms that occur very frequently in the document set and increase the weight of terms that occur rarely
Explanation:
IDF helps to down-weight words that appear in many documents (like stop words) and up-weight words that are rare and distinct to specific documents.
Incorrect! Try again.
19What is 'Lexical Ambiguity'?
A.Confusion about the tone of the text
B.Confusion about who a pronoun refers to
C.Confusion caused by a single word having multiple meanings
D.Confusion about the sentence structure
Correct Answer: Confusion caused by a single word having multiple meanings
Explanation:
Lexical ambiguity occurs when a single word has more than one meaning (polysemy). E.g., 'Bank' (river bank vs. financial bank).
Incorrect! Try again.
20Which of the following is a challenge in Tokenization?
A.Storing the text
B.Displaying the font
C.Identifying abbreviations and acronyms (e.g., U.S.A.)
D.Calculating TF-IDF
Correct Answer: Identifying abbreviations and acronyms (e.g., U.S.A.)
Explanation:
Tokenizers must decide if the period in 'U.S.A.' ends a sentence or is part of the token. This makes punctuation handling a challenge.
Incorrect! Try again.
21What is a 'Corpus' in NLP?
A.The core algorithm of a chatbot
B.A software used for processing text
C.A type of syntax error
D.A large, structured set of texts used for statistical analysis
Correct Answer: A large, structured set of texts used for statistical analysis
Explanation:
A corpus (plural: corpora) is a large collection of text or speech data used to train and test NLP models.
Incorrect! Try again.
22Which of the following describes 'Sentiment Analysis'?
A.Summarizing a long article
B.Converting speech to text
C.Translating text from English to French
D.Determining the emotional tone or opinion expressed in a text
Correct Answer: Determining the emotional tone or opinion expressed in a text
Explanation:
Sentiment analysis aims to determine the attitude (positive, negative, neutral) of a speaker or writer with respect to a topic.
Incorrect! Try again.
23In the phrase 'unhappiness', what is the root morpheme?
A.un
B.happy
C.unhappy
D.ness
Correct Answer: happy
Explanation:
'Happy' is the root (free morpheme), while 'un-' and '-ness' are bound morphemes (affixes).
Incorrect! Try again.
24Which step usually comes FIRST in a standard NLP pipeline?
A.TF-IDF Calculation
B.Tokenization
C.POS Tagging
D.Lemmatization
Correct Answer: Tokenization
Explanation:
Text must be broken down into individual units (tokens) before tasks like tagging, stemming, or calculating frequencies can occur.
Incorrect! Try again.
25Why is 'World Knowledge' a challenge in NLP?
A.Language often relies on common sense and facts about the world that are not explicitly stated in the text
B.Grammar rules are too strict
C.Dictionaries are not large enough
D.Computers do not have enough memory
Correct Answer: Language often relies on common sense and facts about the world that are not explicitly stated in the text
Explanation:
Humans use vast amounts of background knowledge to interpret text (e.g., knowing that water is wet). Encoding this 'common sense' into machines is difficult.
Incorrect! Try again.
26What is the Porter Stemmer?
A.A database of stop words
B.A widely used algorithm for suffix stripping (stemming)
C.A tool for syntax analysis
D.A method for calculating IDF
Correct Answer: A widely used algorithm for suffix stripping (stemming)
Explanation:
The Porter Stemming algorithm is one of the most popular algorithms for reducing English words to their stems.