1What is the primary goal of Natural Language Processing (NLP)?
A.To enable computers to understand, interpret, and generate human language
B.To execute programming languages faster
C.To store large databases of text
D.To encrypt human language for security
Correct Answer: To enable computers to understand, interpret, and generate human language
Explanation:
NLP focuses on the interaction between computers and humans using natural language, aiming to read, decipher, understand, and make sense of human languages in a valuable way.
Incorrect! Try again.
2Which of the following is considered one of the earliest successes in the history of NLP, specifically in Machine Translation?
A.Google Translate
B.The Turing Test
C.The ELIZA Chatbot
D.The Georgetown Experiment
Correct Answer: The Georgetown Experiment
Explanation:
The Georgetown experiment in 1954 involved the automatic translation of more than sixty Russian sentences into English and is considered a landmark early event in NLP.
Incorrect! Try again.
3Which component of NLP is responsible for understanding the structure and formation of words?
A.Semantics
B.Syntax
C.Pragmatics
D.Morphology
Correct Answer: Morphology
Explanation:
Morphology is the study of the structure of words and how they are formed from morphemes (the smallest grammatical units).
Incorrect! Try again.
4In the context of NLP, what is 'Ambiguity'?
A.The speed at which text is processed
B.The phenomenon where a sentence or word has more than one possible interpretation
C.The ability to process multiple languages
D.The lack of data in a corpus
Correct Answer: The phenomenon where a sentence or word has more than one possible interpretation
Explanation:
Ambiguity refers to uncertainty of meaning, where a word, phrase, or sentence can be understood in multiple ways.
Incorrect! Try again.
5Which level of linguistic analysis deals with the arrangement of words to form grammatical sentences?
A.Phonology
B.Morphology
C.Syntax
D.Semantics
Correct Answer: Syntax
Explanation:
Syntax concerns the rules and principles that govern the sentence structure of any individual language.
Incorrect! Try again.
6What is the difference between Syntax and Semantics?
A.Both are about sound patterns
B.Syntax is about structure; Semantics is about meaning
C.Syntax is about meaning; Semantics is about structure
D.There is no difference
Correct Answer: Syntax is about structure; Semantics is about meaning
Explanation:
Syntax ensures the grammatical correctness of the sentence structure, while semantics ensures the sentence makes logical sense or conveys the intended meaning.
Incorrect! Try again.
7Which type of knowledge involves understanding how sentences are used in different situations and how context affects meaning?
A.Pragmatic knowledge
B.Lexical knowledge
C.Syntactic knowledge
D.Phonetic knowledge
Correct Answer: Pragmatic knowledge
Explanation:
Pragmatics studies how context contributes to meaning, looking beyond the literal meaning to the intended meaning in a specific situation.
Incorrect! Try again.
8The sentence 'I saw the man with the telescope' is an example of what type of ambiguity?
A.Syntactic Ambiguity
B.Referential Ambiguity
C.Phonological Ambiguity
D.Lexical Ambiguity
Correct Answer: Syntactic Ambiguity
Explanation:
This is syntactic (structural) ambiguity because it is unclear whether 'I' had the telescope or 'the man' had the telescope based on the sentence structure.
Incorrect! Try again.
9What is the smallest unit of meaning in a language?
A.Morpheme
B.Character
C.Token
D.Phoneme
Correct Answer: Morpheme
Explanation:
A morpheme is the smallest meaningful unit in a language (e.g., 'un-', 'happy', '-ness').
Incorrect! Try again.
10Which NLP application involves automatically classifying an email as 'Spam' or 'Not Spam'?
A.Text Classification
B.Question Answering
C.Machine Translation
D.Text Summarization
Correct Answer: Text Classification
Explanation:
Spam filtering is a classic example of text classification, where a document is assigned to one or more categories.
Incorrect! Try again.
11What is the process of breaking a stream of text into words, phrases, symbols, or other meaningful elements called?
A.Parsing
B.Lemmatization
C.Tokenization
D.Stemming
Correct Answer: Tokenization
Explanation:
Tokenization is the process of segmenting text into smaller units called tokens (words, punctuation, etc.).
Incorrect! Try again.
12Which of the following is an example of a 'Stop Word'?
A.Computer
B.Quickly
C.The
D.Run
Correct Answer: The
Explanation:
Stop words are high-frequency words (like 'the', 'is', 'at', 'which') that usually contribute little unique meaning to the text and are often removed during processing.
Incorrect! Try again.
13What is the main objective of Stemming?
A.To find the dictionary form of a word
B.To correct spelling errors
C.To reduce words to their root or base form, often by chopping off the ends
D.To identify the part of speech
Correct Answer: To reduce words to their root or base form, often by chopping off the ends
Explanation:
Stemming uses heuristic processes (like chopping off suffixes) to reduce words to a base form (stem), which may not be a valid word.
Incorrect! Try again.
14How does Lemmatization differ from Stemming?
A.Lemmatization is faster but less accurate
B.Lemmatization simply chops off suffixes
C.There is no difference
D.Lemmatization considers the context and converts the word to its meaningful dictionary form (lemma)
Correct Answer: Lemmatization considers the context and converts the word to its meaningful dictionary form (lemma)
Explanation:
Unlike stemming, lemmatization uses vocabulary and morphological analysis to return the dictionary form of a word (e.g., 'better' to 'good').
Incorrect! Try again.
15If you stem the word 'ponies', the result might be 'poni'. If you lemmatize 'ponies', the result is likely:
A.pony
B.ponies
C.poni
D.po
Correct Answer: pony
Explanation:
Lemmatization returns the valid base form (lemma), which is 'pony', whereas stemming might just remove the 'es' or 's' resulting in a non-word.
Incorrect! Try again.
16What does TF-IDF stand for?
A.Text Frequency - Index Document Frequency
B.Term Format - Independent Data Format
C.Total Frequency - Internal Data Frequency
D.Term Frequency - Inverse Document Frequency
Correct Answer: Term Frequency - Inverse Document Frequency
Explanation:
TF-IDF stands for Term Frequency-Inverse Document Frequency, a statistical measure used to evaluate the importance of a word in a document.
Incorrect! Try again.
17In TF-IDF, what does 'Term Frequency' (TF) measure?
A.The total number of words in the dictionary
B.How frequently a word appears in a specific document
C.The number of documents containing the word
D.How rare a word is in the entire corpus
Correct Answer: How frequently a word appears in a specific document
Explanation:
TF measures the frequency of a word (term) within a single document being analyzed.
Incorrect! Try again.
18In TF-IDF, what is the purpose of the 'Inverse Document Frequency' (IDF) component?
A.To normalize the length of the document
B.To count how many times a word appears in a sentence
C.To diminish the weight of terms that occur very frequently in the document set and increase the weight of terms that occur rarely
D.To give higher weight to common words like 'the'
Correct Answer: To diminish the weight of terms that occur very frequently in the document set and increase the weight of terms that occur rarely
Explanation:
IDF helps to down-weight words that appear in many documents (like stop words) and up-weight words that are rare and distinct to specific documents.
Incorrect! Try again.
19What is 'Lexical Ambiguity'?
A.Confusion caused by a single word having multiple meanings
B.Confusion about the sentence structure
C.Confusion about the tone of the text
D.Confusion about who a pronoun refers to
Correct Answer: Confusion caused by a single word having multiple meanings
Explanation:
Lexical ambiguity occurs when a single word has more than one meaning (polysemy). E.g., 'Bank' (river bank vs. financial bank).
Incorrect! Try again.
20Which of the following is a challenge in Tokenization?
A.Identifying abbreviations and acronyms (e.g., U.S.A.)
B.Displaying the font
C.Storing the text
D.Calculating TF-IDF
Correct Answer: Identifying abbreviations and acronyms (e.g., U.S.A.)
Explanation:
Tokenizers must decide if the period in 'U.S.A.' ends a sentence or is part of the token. This makes punctuation handling a challenge.
Incorrect! Try again.
21What is a 'Corpus' in NLP?
A.A large, structured set of texts used for statistical analysis
B.The core algorithm of a chatbot
C.A type of syntax error
D.A software used for processing text
Correct Answer: A large, structured set of texts used for statistical analysis
Explanation:
A corpus (plural: corpora) is a large collection of text or speech data used to train and test NLP models.
Incorrect! Try again.
22Which of the following describes 'Sentiment Analysis'?
A.Summarizing a long article
B.Converting speech to text
C.Determining the emotional tone or opinion expressed in a text
D.Translating text from English to French
Correct Answer: Determining the emotional tone or opinion expressed in a text
Explanation:
Sentiment analysis aims to determine the attitude (positive, negative, neutral) of a speaker or writer with respect to a topic.
Incorrect! Try again.
23In the phrase 'unhappiness', what is the root morpheme?
A.unhappy
B.happy
C.ness
D.un
Correct Answer: happy
Explanation:
'Happy' is the root (free morpheme), while 'un-' and '-ness' are bound morphemes (affixes).
Incorrect! Try again.
24Which step usually comes FIRST in a standard NLP pipeline?
A.Lemmatization
B.TF-IDF Calculation
C.Tokenization
D.POS Tagging
Correct Answer: Tokenization
Explanation:
Text must be broken down into individual units (tokens) before tasks like tagging, stemming, or calculating frequencies can occur.
Incorrect! Try again.
25Why is 'World Knowledge' a challenge in NLP?
A.Grammar rules are too strict
B.Dictionaries are not large enough
C.Language often relies on common sense and facts about the world that are not explicitly stated in the text
D.Computers do not have enough memory
Correct Answer: Language often relies on common sense and facts about the world that are not explicitly stated in the text
Explanation:
Humans use vast amounts of background knowledge to interpret text (e.g., knowing that water is wet). Encoding this 'common sense' into machines is difficult.
Incorrect! Try again.
26What is the Porter Stemmer?
A.A widely used algorithm for suffix stripping (stemming)
B.A method for calculating IDF
C.A database of stop words
D.A tool for syntax analysis
Correct Answer: A widely used algorithm for suffix stripping (stemming)
Explanation:
The Porter Stemming algorithm is one of the most popular algorithms for reducing English words to their stems.