Unit 4 - Practice Quiz

INT344 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 In the context of sentiment analysis using Logistic Regression, what is the primary purpose of feature extraction?

A. To increase the length of the sentences
B. To transform raw text into numerical representations like vectors
C. To translate text from one language to another
D. To convert numerical vectors back into text

2 Which function is used in Logistic Regression to map the output to a probability value between 0 and 1?

A. Sigmoid Function
B. Linear Function
C. Tangent Function
D. ReLU Function

3 When extracting features for sentiment analysis, what does a frequency dictionary typically map?

A. Word length to Sentence length
B. Process ID to Memory usage
C. (Word, Sentiment Label) pairs to the count of occurrences
D. Each word to its synonym

4 In a binary logistic regression classifier for sentiment analysis, if the sigmoid output h(x) >= 0.5, how is the sentiment classified?

A. Undefined
B. Negative
C. Neutral
D. Positive

5 What is the formula for the sigmoid function, σ(z)?

A. 1 / (1 - e^-z)
B. log(z)
C. 1 / (1 + e^-z)
D. e^z / (1 + e^z)

6 In the feature vector representation X = [1, sum_pos, sum_neg], what does the '1' usually represent?

A. The classification threshold
B. The learning rate
C. The bias unit (intercept term)
D. The count of the first word

7 Which preprocessing step is commonly performed before feature extraction to reduce the vocabulary size without losing semantic meaning?

A. Stemming and removing stop words
B. Duplicating sentences
C. Removing all vowels
D. Capitalizing all letters

8 What is the 'Cost Function' used in Logistic Regression generally called?

A. Cross-Entropy Loss (Log Loss)
B. Mean Squared Error
C. Absolute Error
D. Hinge Loss

9 What is the goal of Gradient Descent in training a Logistic Regression model?

A. To set all weights to zero
B. To maximize the cost function
C. To remove the bias term
D. To minimize the cost function by iteratively updating weights

10 In Bayes' Rule, what does P(A|B) represent?

A. The prior probability of A
B. The probability of B given A
C. The posterior probability of A given B
D. The joint probability of A and B

11 Why is the Naive Bayes classifier called 'Naive'?

A. It requires very little training data
B. It cannot handle complex text
C. It was developed by a naive mathematician
D. It assumes that features are independent of each other given the class

12 What is the formula for Bayes' Theorem?

A. P(A|B) = P(A) * P(B)
B. P(A|B) = P(B|A) * P(A) / P(B)
C. P(A|B) = P(A) / P(B)
D. P(A|B) = P(B|A) + P(A)

13 In Naive Bayes Sentiment Analysis, what is 'Laplacian Smoothing' used for?

A. To handle words with zero probability (words not seen in training)
B. To average the sentiment scores
C. To remove stop words
D. To smooth the decision boundary

14 If a word appears in the positive corpus but not the negative corpus, what happens to its probability P(W|Negative) without smoothing?

A. It becomes 0
B. It becomes 0.5
C. It becomes 1
D. It becomes infinity

15 In Logistic Regression, if the dot product θ^T * x is 0, what is the output of the sigmoid function?

A. 0
B. 0.5
C. undefined
D. 1

16 Which of the following describes the 'Prior Probability' P(Positive) in Naive Bayes?

A. Probability of a document being positive based on the training set distribution
B. Probability of a document being positive given a specific word
C. Total number of words in the dictionary
D. Probability of a word being positive

17 Why do we typically use Log Likelihood in Naive Bayes calculations instead of raw probabilities?

A. To prevent numerical underflow from multiplying many small probabilities
B. To convert negative numbers to positive
C. To make the calculation harder
D. Because logarithms are faster to compute than addition

18 In the Naive Bayes inference formula, if the sum of the Log Prior and Log Likelihoods is greater than 0, the sentiment is classified as:

A. Positive
B. Neutral
C. Ambiguous
D. Negative

19 What is a 'sparse representation' in the context of NLP feature vectors?

A. A vector with a small dimension
B. A vector with mostly non-zero values
C. A vector where most elements are zero
D. A vector containing only negative numbers

20 Which component of the Naive Bayes classifier represents the 'Evidence' in Bayes' rule?

A. P(Data)
B. P(Data | Class)
C. P(Class | Data)
D. P(Class)

21 In Logistic Regression, what is the dimension of the weight vector θ if the feature vector x has dimension V+1?

A. 1
B. V*2
C. V+1
D. V

22 Which algorithm is considered a 'Generative' model?

A. Naive Bayes
B. Perceptron
C. Logistic Regression
D. Support Vector Machine

23 Which algorithm is considered a 'Discriminative' model?

A. Logistic Regression
B. Hidden Markov Model
C. Naive Bayes
D. Gaussian Mixture Model

24 What is the 'Lambda' (λ) term in the context of Naive Bayes ratio calculation?

A. The learning rate
B. The smoothing parameter
C. The bias unit
D. The number of classes

25 When extracting features for Logistic Regression, if the word 'happy' appears 3 times in a tweet, and 'happy' has a positive frequency of 100 and negative frequency of 5 in the corpus, how is this typically utilized?

A. The counts 100 and 5 contribute to the aggregate sums in the feature vector
B. The word is ignored
C. The number 3 is the only feature used
D. The ratio 100/5 is used as the weight

26 What is the main advantage of Logistic Regression over Naive Bayes?

A. It is a generative model
B. It does not require independent features
C. It handles missing data better
D. It is always faster to train

27 In the context of Naive Bayes, what is V?

A. The vector dimension
B. The vocabulary size (number of unique words)
C. The validation set size
D. The number of classes

28 If the learning rate in Logistic Regression is too large, what might happen?

A. The model will always find the global minimum
B. The model may overshoot the minimum and fail to converge
C. The cost function becomes 0 immediately
D. The model converges very slowly

29 Which of the following is a hyperparameter in Logistic Regression?

A. The weight vector θ
B. The feature vector x
C. The bias term
D. The learning rate α

30 How is the 'Log Prior' calculated for the positive class?

A. log(N_neg / N_pos)
B. log(N_pos / N_neg)
C. log(N_pos / N_total)
D. log(V)

31 What is the range of values for the output of the standard Naive Bayes probability calculation P(y|x) before applying logs?

A. [0, 100]
B. [0, 1]
C. (-infinity, +infinity)
D. [-1, 1]

32 Which sentiment lexicon is purely generated from the training data in the approaches discussed?

A. SentiWordNet
B. The Lambda dictionary (Log Likelihood ratios of words)
C. Google Dictionary
D. WordNet

33 When predicting with Naive Bayes, if a word in the test sentence is not in the training vocabulary (V), what is the standard action?

A. Re-train the model
B. Discard it (ignore it)
C. Assign it a random probability
D. Halt the program

34 What is 'Sentiment Analysis' primarily classifying?

A. The language of the text
B. The topic of the text
C. The grammatical structure
D. The emotional tone or opinion (e.g., Positive/Negative)

35 In Logistic Regression, the decision boundary is:

A. Polynomial
B. Circular
C. Non-linear
D. Linear

36 The denominator for calculating P(w|class) with Laplacian smoothing is:

A. V
B. Count(w in class) + 1
C. N_class (total words in class) + V (vocabulary size)
D. N_class

37 What does a negative value in a word's Log Likelihood (Lambda) score imply in Sentiment Analysis?

A. The word is indicative of Positive sentiment
B. The word is a stop word
C. The word is neutral
D. The word is indicative of Negative sentiment

38 Which of the following is a stop word?

A. The
B. Love
C. Amazing
D. Terrible

39 In a confusion matrix for binary classification, what is a False Positive?

A. Incorrectly predicting positive when the actual class is negative
B. Correctly predicting negative
C. Correctly predicting positive
D. Incorrectly predicting negative when the actual class is positive

40 What is the primary reason for lowercasing text during preprocessing?

A. To remove punctuation
B. It looks better
C. To ensure 'Good' and 'good' are treated as the same feature
D. To detect sentence boundaries

41 If a Logistic Regression model is overfitted, how will it perform?

A. Well on training data, well on test data
B. Poorly on training data, poorly on test data
C. Poorly on training data, well on test data
D. Well on training data, poorly on test data

42 Which formula represents the update rule for weight θ_j in Gradient Descent?

A. θ_j := θ_j / α
B. θ_j := θ_j - α * dJ/dθ_j
C. θ_j := θ_j + α * dJ/dθ_j
D. θ_j := dJ/dθ_j

43 In Naive Bayes, what assumption allows us to multiply probabilities of individual words?

A. Linear Separability
B. Conditional Independence
C. Normal Distribution
D. Homoscedasticity

44 What is Tokenization?

A. Splitting a string of text into individual words or terms
B. Translating text
C. Converting words to numbers
D. Removing special characters

45 Which value of probability corresponds to the logit (log-odds) value of 0?

A. 0
B. 0.5
C. 1
D. Infinity

46 What is the interpretation of weights in Logistic Regression for Sentiment Analysis?

A. They represent the frequency of words
B. They represent the importance and direction (positive/negative) of a feature
C. They represent the index of the word
D. They are random numbers

47 For a balanced dataset, which metric is most straightforward to evaluate performance?

A. Recall only
B. Accuracy
C. Precision only
D. Mean Squared Error

48 In vector space models, what is 'Oov'?

A. Over optimization value
B. Object oriented vector
C. Out of vector
D. Out of vocabulary

49 The conditional probability P(Word|Positive) is conceptually similar to:

A. How often the word appears in the whole dataset
B. The probability that the sentiment is Positive given the word
C. How likely the word is to appear if the sentiment is known to be Positive
D. The probability of the word being a stop word

50 Which technique is essentially a 'probabilistic classifier' based on applying Bayes' theorem?

A. K-Nearest Neighbors
B. Decision Trees
C. K-Means
D. Naive Bayes