Unit 1 - Practice Quiz

INT344

1 What is the primary goal of Natural Language Processing (NLP)?

A. To execute programming languages faster
B. To enable computers to understand, interpret, and generate human language
C. To encrypt human language for security
D. To store large databases of text

2 Which of the following is considered one of the earliest successes in the history of NLP, specifically in Machine Translation?

A. The Turing Test
B. The Georgetown Experiment
C. The ELIZA Chatbot
D. Google Translate

3 Which component of NLP is responsible for understanding the structure and formation of words?

A. Syntax
B. Pragmatics
C. Morphology
D. Semantics

4 In the context of NLP, what is 'Ambiguity'?

A. The ability to process multiple languages
B. The phenomenon where a sentence or word has more than one possible interpretation
C. The speed at which text is processed
D. The lack of data in a corpus

5 Which level of linguistic analysis deals with the arrangement of words to form grammatical sentences?

A. Phonology
B. Morphology
C. Syntax
D. Semantics

6 What is the difference between Syntax and Semantics?

A. Syntax is about meaning; Semantics is about structure
B. Syntax is about structure; Semantics is about meaning
C. Both are about sound patterns
D. There is no difference

7 Which type of knowledge involves understanding how sentences are used in different situations and how context affects meaning?

A. Phonetic knowledge
B. Syntactic knowledge
C. Pragmatic knowledge
D. Lexical knowledge

8 The sentence 'I saw the man with the telescope' is an example of what type of ambiguity?

A. Lexical Ambiguity
B. Syntactic Ambiguity
C. Phonological Ambiguity
D. Referential Ambiguity

9 What is the smallest unit of meaning in a language?

A. Phoneme
B. Morpheme
C. Token
D. Character

10 Which NLP application involves automatically classifying an email as 'Spam' or 'Not Spam'?

A. Machine Translation
B. Text Summarization
C. Text Classification
D. Question Answering

11 What is the process of breaking a stream of text into words, phrases, symbols, or other meaningful elements called?

A. Stemming
B. Tokenization
C. Lemmatization
D. Parsing

12 Which of the following is an example of a 'Stop Word'?

A. Computer
B. The
C. Run
D. Quickly

13 What is the main objective of Stemming?

A. To correct spelling errors
B. To reduce words to their root or base form, often by chopping off the ends
C. To find the dictionary form of a word
D. To identify the part of speech

14 How does Lemmatization differ from Stemming?

A. Lemmatization is faster but less accurate
B. Lemmatization simply chops off suffixes
C. Lemmatization considers the context and converts the word to its meaningful dictionary form (lemma)
D. There is no difference

15 If you stem the word 'ponies', the result might be 'poni'. If you lemmatize 'ponies', the result is likely:

A. poni
B. pony
C. ponies
D. po

16 What does TF-IDF stand for?

A. Term Frequency - Inverse Document Frequency
B. Text Frequency - Index Document Frequency
C. Total Frequency - Internal Data Frequency
D. Term Format - Independent Data Format

17 In TF-IDF, what does 'Term Frequency' (TF) measure?

A. How rare a word is in the entire corpus
B. How frequently a word appears in a specific document
C. The number of documents containing the word
D. The total number of words in the dictionary

18 In TF-IDF, what is the purpose of the 'Inverse Document Frequency' (IDF) component?

A. To give higher weight to common words like 'the'
B. To diminish the weight of terms that occur very frequently in the document set and increase the weight of terms that occur rarely
C. To count how many times a word appears in a sentence
D. To normalize the length of the document

19 What is 'Lexical Ambiguity'?

A. Confusion about the sentence structure
B. Confusion caused by a single word having multiple meanings
C. Confusion about who a pronoun refers to
D. Confusion about the tone of the text

20 Which of the following is a challenge in Tokenization?

A. Identifying abbreviations and acronyms (e.g., U.S.A.)
B. Storing the text
C. Displaying the font
D. Calculating TF-IDF

21 What is a 'Corpus' in NLP?

A. A software used for processing text
B. A large, structured set of texts used for statistical analysis
C. The core algorithm of a chatbot
D. A type of syntax error

22 Which of the following describes 'Sentiment Analysis'?

A. Translating text from English to French
B. Determining the emotional tone or opinion expressed in a text
C. Summarizing a long article
D. Converting speech to text

23 In the phrase 'unhappiness', what is the root morpheme?

A. un
B. happy
C. ness
D. unhappy

24 Which step usually comes FIRST in a standard NLP pipeline?

A. TF-IDF Calculation
B. Tokenization
C. POS Tagging
D. Lemmatization

25 Why is 'World Knowledge' a challenge in NLP?

A. Computers do not have enough memory
B. Language often relies on common sense and facts about the world that are not explicitly stated in the text
C. Grammar rules are too strict
D. Dictionaries are not large enough

26 What is the Porter Stemmer?

A. A tool for syntax analysis
B. A widely used algorithm for suffix stripping (stemming)
C. A method for calculating IDF
D. A database of stop words

27 When might you choose NOT to remove stop words?

A. When analyzing general topic trends
B. When searching for specific phrases like 'to be or not to be'
C. When trying to reduce dataset size
D. When performing simple bag-of-words classification

28 What is 'Referential Ambiguity' usually associated with?

A. Anaphora resolution (pronouns)
B. Word definitions
C. Speech recognition
D. Spelling errors

29 Which of the following is a 'Bound Morpheme'?

A. Dog
B. Eat
C. -ing
D. Table

30 What is the result of applying a tokenizer to the string 'Hello, world!'?

A. ['Hello', 'world']
B. ['Hello, world!']
C. ['Hello', ',', 'world', '!']
D. ['H', 'e', 'l', 'l', 'o']

31 A high TF-IDF score for a word in a specific document indicates:

A. The word is a stop word
B. The word is very common across all documents
C. The word is frequent in that specific document but rare in the overall corpus
D. The word is rare in that document

32 Which area of NLP deals with the sound units of language?

A. Phonology
B. Morphology
C. Syntax
D. Pragmatics

33 Which of the following best describes 'Compositional Semantics'?

A. The meaning of the whole is determined by the meanings of the parts and how they are assembled
B. The meaning is determined solely by the length of the sentence
C. The meaning is random
D. The meaning is determined by the tone of voice

34 In the context of text processing, what does 'Case Folding' refer to?

A. Folding the paper the text is printed on
B. Converting all characters to the same case (usually lowercase)
C. Removing punctuation
D. Identifying proper nouns

35 Which of the following is NOT a typical application of NLP?

A. Stock Market Prediction using news headlines
B. Image Compression
C. Virtual Assistants (Siri, Alexa)
D. Grammar Checkers

36 What is a 'Bag of Words' (BoW) model?

A. A physical bag containing dictionaries
B. A representation of text that describes the occurrence of words but ignores order and grammar
C. A sophisticated syntactic parser
D. A list of stop words

37 Why is 'Slang' a challenge for NLP?

A. It is too formal
B. It changes rapidly and may not appear in standard dictionaries
C. It uses too many vowels
D. It is always in uppercase

38 In the sentence 'Time flies like an arrow', the word 'flies' could be a verb or a noun. This is:

A. Phonetic Ambiguity
B. Part-of-Speech (POS) Ambiguity
C. Pragmatic Ambiguity
D. Stop word Ambiguity

39 If a corpus has 1000 documents and the word 'biology' appears in 1000 of them, what is likely true about its IDF value?

A. It will be very high
B. It will be zero or very close to zero
C. It will be 1000
D. It cannot be calculated

40 Which processing technique requires Part-of-Speech (POS) tagging to be effective?

A. Simple Tokenization
B. Lemmatization
C. Lowercasing
D. Stop word removal

41 Which of the following is considered a 'Free Morpheme'?

A. re-
B. town
C. -ly
D. -ed

42 The Turing Test was proposed by Alan Turing to determine:

A. The speed of a computer
B. If a machine can exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human
C. The memory capacity of a hard drive
D. The accuracy of machine translation

43 What is 'Discourse Analysis'?

A. Analyzing single words in isolation
B. Analyzing language use beyond the sentence boundary
C. Converting audio to text
D. Sorting words alphabetically

44 In the TF-IDF formula, if term 't' appears 5 times in a document of 100 words, the Normalized TF is:

A. 5
B. 0.05
C. 500
D. 95

45 What is the primary motivation for 'Text Normalization' (like stemming/lemmatization) in search engines?

A. To make the text look pretty
B. To match a user's query (e.g., 'running') with documents containing related forms (e.g., 'run')
C. To remove all verbs
D. To translate the query

46 Which linguistic field studies how words combine to form phrases and sentences?

A. Syntax
B. Phonetics
C. Morphology
D. Semantics

47 When tokenizing text from social media (e.g., Twitter/X), what is a specific challenge?

A. Handling hashtags (#) and mentions (@)
B. The text is too long
C. There are no vowels
D. It is always formal English

48 What defines a 'Regular Language' in the context of automata and NLP?

A. A language that can be described by a regular expression
B. A language with no slang
C. A language spoken by humans
D. A programming language

49 Which of the following sentences best illustrates 'Semantic Ambiguity' (not Syntactic)?

A. I saw the man with the telescope.
B. The bank was closed.
C. Flying planes can be dangerous.
D. Visiting relatives can be boring.

50 What is the relationship between AI and NLP?

A. They are completely unrelated fields
B. NLP is a subfield of AI focused on language
C. AI is a subfield of NLP
D. NLP replaces AI