Unit 4 - Practice Quiz

INT344 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 In the context of sentiment analysis using Logistic Regression, what is the primary purpose of feature extraction?

A. To convert numerical vectors back into text
B. To transform raw text into numerical representations like vectors
C. To translate text from one language to another
D. To increase the length of the sentences

2 Which function is used in Logistic Regression to map the output to a probability value between 0 and 1?

A. Linear Function
B. ReLU Function
C. Sigmoid Function
D. Tangent Function

3 When extracting features for sentiment analysis, what does a frequency dictionary typically map?

A. Word length to Sentence length
B. Process ID to Memory usage
C. (Word, Sentiment Label) pairs to the count of occurrences
D. Each word to its synonym

4 In a binary logistic regression classifier for sentiment analysis, if the sigmoid output h(x) >= 0.5, how is the sentiment classified?

A. Positive
B. Neutral
C. Undefined
D. Negative

5 What is the formula for the sigmoid function, σ(z)?

A. 1 / (1 + e^-z)
B. e^z / (1 + e^z)
C. log(z)
D. 1 / (1 - e^-z)

6 In the feature vector representation X = [1, sum_pos, sum_neg], what does the '1' usually represent?

A. The bias unit (intercept term)
B. The learning rate
C. The classification threshold
D. The count of the first word

7 Which preprocessing step is commonly performed before feature extraction to reduce the vocabulary size without losing semantic meaning?

A. Stemming and removing stop words
B. Removing all vowels
C. Capitalizing all letters
D. Duplicating sentences

8 What is the 'Cost Function' used in Logistic Regression generally called?

A. Absolute Error
B. Hinge Loss
C. Mean Squared Error
D. Cross-Entropy Loss (Log Loss)

9 What is the goal of Gradient Descent in training a Logistic Regression model?

A. To set all weights to zero
B. To minimize the cost function by iteratively updating weights
C. To remove the bias term
D. To maximize the cost function

10 In Bayes' Rule, what does P(A|B) represent?

A. The prior probability of A
B. The posterior probability of A given B
C. The probability of B given A
D. The joint probability of A and B

11 Why is the Naive Bayes classifier called 'Naive'?

A. It cannot handle complex text
B. It was developed by a naive mathematician
C. It requires very little training data
D. It assumes that features are independent of each other given the class

12 What is the formula for Bayes' Theorem?

A. P(A|B) = P(A) / P(B)
B. P(A|B) = P(B|A) + P(A)
C. P(A|B) = P(B|A) * P(A) / P(B)
D. P(A|B) = P(A) * P(B)

13 In Naive Bayes Sentiment Analysis, what is 'Laplacian Smoothing' used for?

A. To smooth the decision boundary
B. To handle words with zero probability (words not seen in training)
C. To remove stop words
D. To average the sentiment scores

14 If a word appears in the positive corpus but not the negative corpus, what happens to its probability P(W|Negative) without smoothing?

A. It becomes 1
B. It becomes 0.5
C. It becomes infinity
D. It becomes 0

15 In Logistic Regression, if the dot product θ^T * x is 0, what is the output of the sigmoid function?

A. 1
B. undefined
C. 0
D. 0.5

16 Which of the following describes the 'Prior Probability' P(Positive) in Naive Bayes?

A. Probability of a document being positive given a specific word
B. Total number of words in the dictionary
C. Probability of a word being positive
D. Probability of a document being positive based on the training set distribution

17 Why do we typically use Log Likelihood in Naive Bayes calculations instead of raw probabilities?

A. To make the calculation harder
B. To prevent numerical underflow from multiplying many small probabilities
C. To convert negative numbers to positive
D. Because logarithms are faster to compute than addition

18 In the Naive Bayes inference formula, if the sum of the Log Prior and Log Likelihoods is greater than 0, the sentiment is classified as:

A. Ambiguous
B. Neutral
C. Negative
D. Positive

19 What is a 'sparse representation' in the context of NLP feature vectors?

A. A vector with a small dimension
B. A vector where most elements are zero
C. A vector containing only negative numbers
D. A vector with mostly non-zero values

20 Which component of the Naive Bayes classifier represents the 'Evidence' in Bayes' rule?

A. P(Data | Class)
B. P(Data)
C. P(Class | Data)
D. P(Class)

21 In Logistic Regression, what is the dimension of the weight vector θ if the feature vector x has dimension V+1?

A. 1
B. V*2
C. V+1
D. V

22 Which algorithm is considered a 'Generative' model?

A. Naive Bayes
B. Logistic Regression
C. Support Vector Machine
D. Perceptron

23 Which algorithm is considered a 'Discriminative' model?

A. Hidden Markov Model
B. Gaussian Mixture Model
C. Naive Bayes
D. Logistic Regression

24 What is the 'Lambda' (λ) term in the context of Naive Bayes ratio calculation?

A. The learning rate
B. The number of classes
C. The bias unit
D. The smoothing parameter

25 When extracting features for Logistic Regression, if the word 'happy' appears 3 times in a tweet, and 'happy' has a positive frequency of 100 and negative frequency of 5 in the corpus, how is this typically utilized?

A. The number 3 is the only feature used
B. The ratio 100/5 is used as the weight
C. The word is ignored
D. The counts 100 and 5 contribute to the aggregate sums in the feature vector

26 What is the main advantage of Logistic Regression over Naive Bayes?

A. It is a generative model
B. It handles missing data better
C. It is always faster to train
D. It does not require independent features

27 In the context of Naive Bayes, what is V?

A. The number of classes
B. The vector dimension
C. The validation set size
D. The vocabulary size (number of unique words)

28 If the learning rate in Logistic Regression is too large, what might happen?

A. The model converges very slowly
B. The cost function becomes 0 immediately
C. The model may overshoot the minimum and fail to converge
D. The model will always find the global minimum

29 Which of the following is a hyperparameter in Logistic Regression?

A. The feature vector x
B. The bias term
C. The weight vector θ
D. The learning rate α

30 How is the 'Log Prior' calculated for the positive class?

A. log(N_pos / N_neg)
B. log(N_neg / N_pos)
C. log(N_pos / N_total)
D. log(V)

31 What is the range of values for the output of the standard Naive Bayes probability calculation P(y|x) before applying logs?

A. [-1, 1]
B. (-infinity, +infinity)
C. [0, 1]
D. [0, 100]

32 Which sentiment lexicon is purely generated from the training data in the approaches discussed?

A. Google Dictionary
B. SentiWordNet
C. WordNet
D. The Lambda dictionary (Log Likelihood ratios of words)

33 When predicting with Naive Bayes, if a word in the test sentence is not in the training vocabulary (V), what is the standard action?

A. Re-train the model
B. Discard it (ignore it)
C. Assign it a random probability
D. Halt the program

34 What is 'Sentiment Analysis' primarily classifying?

A. The topic of the text
B. The grammatical structure
C. The emotional tone or opinion (e.g., Positive/Negative)
D. The language of the text

35 In Logistic Regression, the decision boundary is:

A. Non-linear
B. Circular
C. Linear
D. Polynomial

36 The denominator for calculating P(w|class) with Laplacian smoothing is:

A. N_class
B. V
C. N_class (total words in class) + V (vocabulary size)
D. Count(w in class) + 1

37 What does a negative value in a word's Log Likelihood (Lambda) score imply in Sentiment Analysis?

A. The word is indicative of Positive sentiment
B. The word is indicative of Negative sentiment
C. The word is neutral
D. The word is a stop word

38 Which of the following is a stop word?

A. The
B. Love
C. Amazing
D. Terrible

39 In a confusion matrix for binary classification, what is a False Positive?

A. Incorrectly predicting negative when the actual class is positive
B. Correctly predicting positive
C. Correctly predicting negative
D. Incorrectly predicting positive when the actual class is negative

40 What is the primary reason for lowercasing text during preprocessing?

A. To detect sentence boundaries
B. To remove punctuation
C. To ensure 'Good' and 'good' are treated as the same feature
D. It looks better

41 If a Logistic Regression model is overfitted, how will it perform?

A. Poorly on training data, well on test data
B. Well on training data, well on test data
C. Well on training data, poorly on test data
D. Poorly on training data, poorly on test data

42 Which formula represents the update rule for weight θ_j in Gradient Descent?

A. θ_j := θ_j + α * dJ/dθ_j
B. θ_j := θ_j - α * dJ/dθ_j
C. θ_j := θ_j / α
D. θ_j := dJ/dθ_j

43 In Naive Bayes, what assumption allows us to multiply probabilities of individual words?

A. Conditional Independence
B. Homoscedasticity
C. Normal Distribution
D. Linear Separability

44 What is Tokenization?

A. Translating text
B. Splitting a string of text into individual words or terms
C. Converting words to numbers
D. Removing special characters

45 Which value of probability corresponds to the logit (log-odds) value of 0?

A. 1
B. 0
C. Infinity
D. 0.5

46 What is the interpretation of weights in Logistic Regression for Sentiment Analysis?

A. They are random numbers
B. They represent the frequency of words
C. They represent the importance and direction (positive/negative) of a feature
D. They represent the index of the word

47 For a balanced dataset, which metric is most straightforward to evaluate performance?

A. Mean Squared Error
B. Precision only
C. Recall only
D. Accuracy

48 In vector space models, what is 'Oov'?

A. Over optimization value
B. Out of vector
C. Out of vocabulary
D. Object oriented vector

49 The conditional probability P(Word|Positive) is conceptually similar to:

A. The probability that the sentiment is Positive given the word
B. How often the word appears in the whole dataset
C. How likely the word is to appear if the sentiment is known to be Positive
D. The probability of the word being a stop word

50 Which technique is essentially a 'probabilistic classifier' based on applying Bayes' theorem?

A. Decision Trees
B. K-Means
C. K-Nearest Neighbors
D. Naive Bayes