Unit 1 - Practice Quiz

INT344 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the primary goal of Natural Language Processing (NLP)?

A. To enable computers to understand, interpret, and generate human language
B. To execute programming languages faster
C. To store large databases of text
D. To encrypt human language for security

2 Which of the following is considered one of the earliest successes in the history of NLP, specifically in Machine Translation?

A. Google Translate
B. The Turing Test
C. The ELIZA Chatbot
D. The Georgetown Experiment

3 Which component of NLP is responsible for understanding the structure and formation of words?

A. Semantics
B. Syntax
C. Pragmatics
D. Morphology

4 In the context of NLP, what is 'Ambiguity'?

A. The speed at which text is processed
B. The phenomenon where a sentence or word has more than one possible interpretation
C. The ability to process multiple languages
D. The lack of data in a corpus

5 Which level of linguistic analysis deals with the arrangement of words to form grammatical sentences?

A. Phonology
B. Morphology
C. Syntax
D. Semantics

6 What is the difference between Syntax and Semantics?

A. Both are about sound patterns
B. Syntax is about structure; Semantics is about meaning
C. Syntax is about meaning; Semantics is about structure
D. There is no difference

7 Which type of knowledge involves understanding how sentences are used in different situations and how context affects meaning?

A. Pragmatic knowledge
B. Lexical knowledge
C. Syntactic knowledge
D. Phonetic knowledge

8 The sentence 'I saw the man with the telescope' is an example of what type of ambiguity?

A. Syntactic Ambiguity
B. Referential Ambiguity
C. Phonological Ambiguity
D. Lexical Ambiguity

9 What is the smallest unit of meaning in a language?

A. Morpheme
B. Character
C. Token
D. Phoneme

10 Which NLP application involves automatically classifying an email as 'Spam' or 'Not Spam'?

A. Text Classification
B. Question Answering
C. Machine Translation
D. Text Summarization

11 What is the process of breaking a stream of text into words, phrases, symbols, or other meaningful elements called?

A. Parsing
B. Lemmatization
C. Tokenization
D. Stemming

12 Which of the following is an example of a 'Stop Word'?

A. Computer
B. Quickly
C. The
D. Run

13 What is the main objective of Stemming?

A. To find the dictionary form of a word
B. To correct spelling errors
C. To reduce words to their root or base form, often by chopping off the ends
D. To identify the part of speech

14 How does Lemmatization differ from Stemming?

A. Lemmatization is faster but less accurate
B. Lemmatization simply chops off suffixes
C. There is no difference
D. Lemmatization considers the context and converts the word to its meaningful dictionary form (lemma)

15 If you stem the word 'ponies', the result might be 'poni'. If you lemmatize 'ponies', the result is likely:

A. pony
B. ponies
C. poni
D. po

16 What does TF-IDF stand for?

A. Text Frequency - Index Document Frequency
B. Term Format - Independent Data Format
C. Total Frequency - Internal Data Frequency
D. Term Frequency - Inverse Document Frequency

17 In TF-IDF, what does 'Term Frequency' (TF) measure?

A. The total number of words in the dictionary
B. How frequently a word appears in a specific document
C. The number of documents containing the word
D. How rare a word is in the entire corpus

18 In TF-IDF, what is the purpose of the 'Inverse Document Frequency' (IDF) component?

A. To normalize the length of the document
B. To count how many times a word appears in a sentence
C. To diminish the weight of terms that occur very frequently in the document set and increase the weight of terms that occur rarely
D. To give higher weight to common words like 'the'

19 What is 'Lexical Ambiguity'?

A. Confusion caused by a single word having multiple meanings
B. Confusion about the sentence structure
C. Confusion about the tone of the text
D. Confusion about who a pronoun refers to

20 Which of the following is a challenge in Tokenization?

A. Identifying abbreviations and acronyms (e.g., U.S.A.)
B. Displaying the font
C. Storing the text
D. Calculating TF-IDF

21 What is a 'Corpus' in NLP?

A. A large, structured set of texts used for statistical analysis
B. The core algorithm of a chatbot
C. A type of syntax error
D. A software used for processing text

22 Which of the following describes 'Sentiment Analysis'?

A. Summarizing a long article
B. Converting speech to text
C. Determining the emotional tone or opinion expressed in a text
D. Translating text from English to French

23 In the phrase 'unhappiness', what is the root morpheme?

A. unhappy
B. happy
C. ness
D. un

24 Which step usually comes FIRST in a standard NLP pipeline?

A. Lemmatization
B. TF-IDF Calculation
C. Tokenization
D. POS Tagging

25 Why is 'World Knowledge' a challenge in NLP?

A. Grammar rules are too strict
B. Dictionaries are not large enough
C. Language often relies on common sense and facts about the world that are not explicitly stated in the text
D. Computers do not have enough memory

26 What is the Porter Stemmer?

A. A widely used algorithm for suffix stripping (stemming)
B. A method for calculating IDF
C. A database of stop words
D. A tool for syntax analysis

27 When might you choose NOT to remove stop words?

A. When trying to reduce dataset size
B. When performing simple bag-of-words classification
C. When searching for specific phrases like 'to be or not to be'
D. When analyzing general topic trends

28 What is 'Referential Ambiguity' usually associated with?

A. Spelling errors
B. Anaphora resolution (pronouns)
C. Word definitions
D. Speech recognition

29 Which of the following is a 'Bound Morpheme'?

A. -ing
B. Dog
C. Eat
D. Table

30 What is the result of applying a tokenizer to the string 'Hello, world!'?

A. ['H', 'e', 'l', 'l', 'o']
B. ['Hello', ',', 'world', '!']
C. ['Hello, world!']
D. ['Hello', 'world']

31 A high TF-IDF score for a word in a specific document indicates:

A. The word is rare in that document
B. The word is a stop word
C. The word is frequent in that specific document but rare in the overall corpus
D. The word is very common across all documents

32 Which area of NLP deals with the sound units of language?

A. Phonology
B. Syntax
C. Morphology
D. Pragmatics

33 Which of the following best describes 'Compositional Semantics'?

A. The meaning is determined solely by the length of the sentence
B. The meaning is random
C. The meaning of the whole is determined by the meanings of the parts and how they are assembled
D. The meaning is determined by the tone of voice

34 In the context of text processing, what does 'Case Folding' refer to?

A. Folding the paper the text is printed on
B. Identifying proper nouns
C. Removing punctuation
D. Converting all characters to the same case (usually lowercase)

35 Which of the following is NOT a typical application of NLP?

A. Grammar Checkers
B. Image Compression
C. Stock Market Prediction using news headlines
D. Virtual Assistants (Siri, Alexa)

36 What is a 'Bag of Words' (BoW) model?

A. A representation of text that describes the occurrence of words but ignores order and grammar
B. A sophisticated syntactic parser
C. A list of stop words
D. A physical bag containing dictionaries

37 Why is 'Slang' a challenge for NLP?

A. It is always in uppercase
B. It uses too many vowels
C. It is too formal
D. It changes rapidly and may not appear in standard dictionaries

38 In the sentence 'Time flies like an arrow', the word 'flies' could be a verb or a noun. This is:

A. Pragmatic Ambiguity
B. Phonetic Ambiguity
C. Stop word Ambiguity
D. Part-of-Speech (POS) Ambiguity

39 If a corpus has 1000 documents and the word 'biology' appears in 1000 of them, what is likely true about its IDF value?

A. It will be very high
B. It cannot be calculated
C. It will be zero or very close to zero
D. It will be 1000

40 Which processing technique requires Part-of-Speech (POS) tagging to be effective?

A. Lowercasing
B. Simple Tokenization
C. Stop word removal
D. Lemmatization

41 Which of the following is considered a 'Free Morpheme'?

A. re-
B. -ed
C. town
D. -ly

42 The Turing Test was proposed by Alan Turing to determine:

A. The memory capacity of a hard drive
B. If a machine can exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human
C. The accuracy of machine translation
D. The speed of a computer

43 What is 'Discourse Analysis'?

A. Analyzing single words in isolation
B. Sorting words alphabetically
C. Analyzing language use beyond the sentence boundary
D. Converting audio to text

44 In the TF-IDF formula, if term 't' appears 5 times in a document of 100 words, the Normalized TF is:

A. 95
B. 0.05
C. 5
D. 500

45 What is the primary motivation for 'Text Normalization' (like stemming/lemmatization) in search engines?

A. To make the text look pretty
B. To match a user's query (e.g., 'running') with documents containing related forms (e.g., 'run')
C. To translate the query
D. To remove all verbs

46 Which linguistic field studies how words combine to form phrases and sentences?

A. Phonetics
B. Semantics
C. Morphology
D. Syntax

47 When tokenizing text from social media (e.g., Twitter/X), what is a specific challenge?

A. There are no vowels
B. It is always formal English
C. The text is too long
D. Handling hashtags (#) and mentions (@)

48 What defines a 'Regular Language' in the context of automata and NLP?

A. A language with no slang
B. A programming language
C. A language spoken by humans
D. A language that can be described by a regular expression

49 Which of the following sentences best illustrates 'Semantic Ambiguity' (not Syntactic)?

A. Visiting relatives can be boring.
B. I saw the man with the telescope.
C. Flying planes can be dangerous.
D. The bank was closed.

50 What is the relationship between AI and NLP?

A. AI is a subfield of NLP
B. They are completely unrelated fields
C. NLP is a subfield of AI focused on language
D. NLP replaces AI