Unit 1 - Practice Quiz

INT344 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the primary goal of Natural Language Processing (NLP)?

A. To execute programming languages faster
B. To enable computers to understand, interpret, and generate human language
C. To encrypt human language for security
D. To store large databases of text

2 Which of the following is considered one of the earliest successes in the history of NLP, specifically in Machine Translation?

A. The Georgetown Experiment
B. Google Translate
C. The ELIZA Chatbot
D. The Turing Test

3 Which component of NLP is responsible for understanding the structure and formation of words?

A. Morphology
B. Syntax
C. Semantics
D. Pragmatics

4 In the context of NLP, what is 'Ambiguity'?

A. The speed at which text is processed
B. The lack of data in a corpus
C. The phenomenon where a sentence or word has more than one possible interpretation
D. The ability to process multiple languages

5 Which level of linguistic analysis deals with the arrangement of words to form grammatical sentences?

A. Phonology
B. Morphology
C. Syntax
D. Semantics

6 What is the difference between Syntax and Semantics?

A. There is no difference
B. Syntax is about meaning; Semantics is about structure
C. Syntax is about structure; Semantics is about meaning
D. Both are about sound patterns

7 Which type of knowledge involves understanding how sentences are used in different situations and how context affects meaning?

A. Syntactic knowledge
B. Phonetic knowledge
C. Lexical knowledge
D. Pragmatic knowledge

8 The sentence 'I saw the man with the telescope' is an example of what type of ambiguity?

A. Phonological Ambiguity
B. Syntactic Ambiguity
C. Lexical Ambiguity
D. Referential Ambiguity

9 What is the smallest unit of meaning in a language?

A. Token
B. Morpheme
C. Phoneme
D. Character

10 Which NLP application involves automatically classifying an email as 'Spam' or 'Not Spam'?

A. Text Summarization
B. Question Answering
C. Machine Translation
D. Text Classification

11 What is the process of breaking a stream of text into words, phrases, symbols, or other meaningful elements called?

A. Parsing
B. Tokenization
C. Lemmatization
D. Stemming

12 Which of the following is an example of a 'Stop Word'?

A. Quickly
B. Computer
C. The
D. Run

13 What is the main objective of Stemming?

A. To reduce words to their root or base form, often by chopping off the ends
B. To identify the part of speech
C. To find the dictionary form of a word
D. To correct spelling errors

14 How does Lemmatization differ from Stemming?

A. Lemmatization simply chops off suffixes
B. Lemmatization considers the context and converts the word to its meaningful dictionary form (lemma)
C. Lemmatization is faster but less accurate
D. There is no difference

15 If you stem the word 'ponies', the result might be 'poni'. If you lemmatize 'ponies', the result is likely:

A. pony
B. po
C. ponies
D. poni

16 What does TF-IDF stand for?

A. Term Frequency - Inverse Document Frequency
B. Term Format - Independent Data Format
C. Total Frequency - Internal Data Frequency
D. Text Frequency - Index Document Frequency

17 In TF-IDF, what does 'Term Frequency' (TF) measure?

A. The number of documents containing the word
B. How frequently a word appears in a specific document
C. The total number of words in the dictionary
D. How rare a word is in the entire corpus

18 In TF-IDF, what is the purpose of the 'Inverse Document Frequency' (IDF) component?

A. To diminish the weight of terms that occur very frequently in the document set and increase the weight of terms that occur rarely
B. To count how many times a word appears in a sentence
C. To give higher weight to common words like 'the'
D. To normalize the length of the document

19 What is 'Lexical Ambiguity'?

A. Confusion about the tone of the text
B. Confusion about who a pronoun refers to
C. Confusion caused by a single word having multiple meanings
D. Confusion about the sentence structure

20 Which of the following is a challenge in Tokenization?

A. Storing the text
B. Displaying the font
C. Identifying abbreviations and acronyms (e.g., U.S.A.)
D. Calculating TF-IDF

21 What is a 'Corpus' in NLP?

A. The core algorithm of a chatbot
B. A software used for processing text
C. A type of syntax error
D. A large, structured set of texts used for statistical analysis

22 Which of the following describes 'Sentiment Analysis'?

A. Summarizing a long article
B. Converting speech to text
C. Translating text from English to French
D. Determining the emotional tone or opinion expressed in a text

23 In the phrase 'unhappiness', what is the root morpheme?

A. un
B. happy
C. unhappy
D. ness

24 Which step usually comes FIRST in a standard NLP pipeline?

A. TF-IDF Calculation
B. Tokenization
C. POS Tagging
D. Lemmatization

25 Why is 'World Knowledge' a challenge in NLP?

A. Language often relies on common sense and facts about the world that are not explicitly stated in the text
B. Grammar rules are too strict
C. Dictionaries are not large enough
D. Computers do not have enough memory

26 What is the Porter Stemmer?

A. A database of stop words
B. A widely used algorithm for suffix stripping (stemming)
C. A tool for syntax analysis
D. A method for calculating IDF

27 When might you choose NOT to remove stop words?

A. When performing simple bag-of-words classification
B. When searching for specific phrases like 'to be or not to be'
C. When analyzing general topic trends
D. When trying to reduce dataset size

28 What is 'Referential Ambiguity' usually associated with?

A. Anaphora resolution (pronouns)
B. Word definitions
C. Spelling errors
D. Speech recognition

29 Which of the following is a 'Bound Morpheme'?

A. Eat
B. -ing
C. Dog
D. Table

30 What is the result of applying a tokenizer to the string 'Hello, world!'?

A. ['H', 'e', 'l', 'l', 'o']
B. ['Hello', ',', 'world', '!']
C. ['Hello', 'world']
D. ['Hello, world!']

31 A high TF-IDF score for a word in a specific document indicates:

A. The word is very common across all documents
B. The word is frequent in that specific document but rare in the overall corpus
C. The word is a stop word
D. The word is rare in that document

32 Which area of NLP deals with the sound units of language?

A. Pragmatics
B. Morphology
C. Syntax
D. Phonology

33 Which of the following best describes 'Compositional Semantics'?

A. The meaning is determined by the tone of voice
B. The meaning of the whole is determined by the meanings of the parts and how they are assembled
C. The meaning is random
D. The meaning is determined solely by the length of the sentence

34 In the context of text processing, what does 'Case Folding' refer to?

A. Folding the paper the text is printed on
B. Identifying proper nouns
C. Removing punctuation
D. Converting all characters to the same case (usually lowercase)

35 Which of the following is NOT a typical application of NLP?

A. Virtual Assistants (Siri, Alexa)
B. Image Compression
C. Grammar Checkers
D. Stock Market Prediction using news headlines

36 What is a 'Bag of Words' (BoW) model?

A. A sophisticated syntactic parser
B. A physical bag containing dictionaries
C. A list of stop words
D. A representation of text that describes the occurrence of words but ignores order and grammar

37 Why is 'Slang' a challenge for NLP?

A. It is always in uppercase
B. It is too formal
C. It uses too many vowels
D. It changes rapidly and may not appear in standard dictionaries

38 In the sentence 'Time flies like an arrow', the word 'flies' could be a verb or a noun. This is:

A. Phonetic Ambiguity
B. Pragmatic Ambiguity
C. Part-of-Speech (POS) Ambiguity
D. Stop word Ambiguity

39 If a corpus has 1000 documents and the word 'biology' appears in 1000 of them, what is likely true about its IDF value?

A. It will be 1000
B. It will be very high
C. It cannot be calculated
D. It will be zero or very close to zero

40 Which processing technique requires Part-of-Speech (POS) tagging to be effective?

A. Simple Tokenization
B. Stop word removal
C. Lemmatization
D. Lowercasing

41 Which of the following is considered a 'Free Morpheme'?

A. town
B. -ly
C. re-
D. -ed

42 The Turing Test was proposed by Alan Turing to determine:

A. The accuracy of machine translation
B. The speed of a computer
C. The memory capacity of a hard drive
D. If a machine can exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human

43 What is 'Discourse Analysis'?

A. Sorting words alphabetically
B. Analyzing single words in isolation
C. Converting audio to text
D. Analyzing language use beyond the sentence boundary

44 In the TF-IDF formula, if term 't' appears 5 times in a document of 100 words, the Normalized TF is:

A. 95
B. 500
C. 5
D. 0.05

45 What is the primary motivation for 'Text Normalization' (like stemming/lemmatization) in search engines?

A. To make the text look pretty
B. To match a user's query (e.g., 'running') with documents containing related forms (e.g., 'run')
C. To translate the query
D. To remove all verbs

46 Which linguistic field studies how words combine to form phrases and sentences?

A. Syntax
B. Semantics
C. Morphology
D. Phonetics

47 When tokenizing text from social media (e.g., Twitter/X), what is a specific challenge?

A. It is always formal English
B. The text is too long
C. Handling hashtags (#) and mentions (@)
D. There are no vowels

48 What defines a 'Regular Language' in the context of automata and NLP?

A. A programming language
B. A language spoken by humans
C. A language that can be described by a regular expression
D. A language with no slang

49 Which of the following sentences best illustrates 'Semantic Ambiguity' (not Syntactic)?

A. Flying planes can be dangerous.
B. Visiting relatives can be boring.
C. I saw the man with the telescope.
D. The bank was closed.

50 What is the relationship between AI and NLP?

A. They are completely unrelated fields
B. NLP replaces AI
C. AI is a subfield of NLP
D. NLP is a subfield of AI focused on language