1 $In the context of sentiment analysis using Logistic Regression, what is the primary purpose of feature extraction?$

A.

To convert numerical vectors back into text

B.

To transform raw text into numerical representations like vectors

C.

To increase the length of the sentences

D.

To translate text from one language to another

2 $Which function is used in Logistic Regression to map the output to a probability value between 0 and 1?$

A.

ReLU Function

B.

Sigmoid Function

C.

Tangent Function

D.

Linear Function

3 $When extracting features for sentiment analysis, what does a frequency dictionary typically map?$

A.

Process ID to Memory usage

B.

(Word, Sentiment Label) pairs to the count of occurrences

C.

Word length to Sentence length

D.

Each word to its synonym

4 $In a binary logistic regression classifier for sentiment analysis, if the sigmoid output h(x) >= 0.5, how is the sentiment classified?$

A.

Negative

B.

Positive

C.

Neutral

D.

Undefined

5 $What is the formula for the sigmoid function, σ(z)?$

A.

1 / (1 + e^-z)

B.

1 / (1 - e^-z)

C.

e^z / (1 + e^z)

D.

log(z)

6 $In the feature vector representation X = [1, sum_pos, sum_neg], what does the '1' usually represent?$

A.

The count of the first word

B.

The bias unit (intercept term)

C.

The learning rate

D.

The classification threshold

7 $Which preprocessing step is commonly performed before feature extraction to reduce the vocabulary size without losing semantic meaning?$

A.

Duplicating sentences

B.

Stemming and removing stop words

C.

Capitalizing all letters

D.

Removing all vowels

8 $What is the 'Cost Function' used in Logistic Regression generally called?$

A.

Mean Squared Error

B.

Cross-Entropy Loss (Log Loss)

C.

Hinge Loss

D.

Absolute Error

9 $What is the goal of Gradient Descent in training a Logistic Regression model?$

A.

To maximize the cost function

B.

To minimize the cost function by iteratively updating weights

C.

To set all weights to zero

D.

To remove the bias term

10 $In Bayes' Rule, what does P(A|B) represent?$

A.

The joint probability of A and B

B.

The probability of B given A

C.

The posterior probability of A given B

D.

The prior probability of A

11 $Why is the Naive Bayes classifier called 'Naive'?$

A.

It requires very little training data

B.

It assumes that features are independent of each other given the class

C.

It was developed by a naive mathematician

D.

It cannot handle complex text

12 $What is the formula for Bayes' Theorem?$

A.

P(A|B) = P(B|A) * P(A) / P(B)

B.

P(A|B) = P(A) * P(B)

C.

P(A|B) = P(B|A) + P(A)

D.

P(A|B) = P(A) / P(B)

13 $In Naive Bayes Sentiment Analysis, what is 'Laplacian Smoothing' used for?$

A.

To remove stop words

B.

To handle words with zero probability (words not seen in training)

C.

To smooth the decision boundary

D.

To average the sentiment scores

14 $If a word appears in the positive corpus but not the negative corpus, what happens to its probability P(W|Negative) without smoothing?$

A.

It becomes 1

B.

It becomes 0.5

C.

It becomes 0

D.

It becomes infinity

15 $In Logistic Regression, if the dot product θ^T * x is 0, what is the output of the sigmoid function?$

A.

0

B.

1

C.

0.5

D.

undefined

16 $Which of the following describes the 'Prior Probability' P(Positive) in Naive Bayes?$

A.

Probability of a word being positive

B.

Probability of a document being positive based on the training set distribution

C.

Probability of a document being positive given a specific word

D.

Total number of words in the dictionary

17 $Why do we typically use Log Likelihood in Naive Bayes calculations instead of raw probabilities?$

A.

To convert negative numbers to positive

B.

To prevent numerical underflow from multiplying many small probabilities

C.

To make the calculation harder

D.

Because logarithms are faster to compute than addition

18 $In the Naive Bayes inference formula, if the sum of the Log Prior and Log Likelihoods is greater than 0, the sentiment is classified as:$

A.

Negative

B.

Neutral

C.

Positive

D.

Ambiguous

19 $What is a 'sparse representation' in the context of NLP feature vectors?$

A.

A vector with mostly non-zero values

B.

A vector where most elements are zero

C.

A vector with a small dimension

D.

A vector containing only negative numbers

20 $Which component of the Naive Bayes classifier represents the 'Evidence' in Bayes' rule?$

A.

P(Class)

B.

P(Data | Class)

C.

P(Data)

D.

P(Class | Data)

21 $In Logistic Regression, what is the dimension of the weight vector θ if the feature vector x has dimension V+1?$

A.

V

B.

1

C.

V+1

D.

V*2

22 $Which algorithm is considered a 'Generative' model?$

A.

Logistic Regression

B.

Naive Bayes

C.

Support Vector Machine

D.

Perceptron

23 $Which algorithm is considered a 'Discriminative' model?$

A.

Naive Bayes

B.

Logistic Regression

C.

Hidden Markov Model

D.

Gaussian Mixture Model

24 $What is the 'Lambda' (λ) term in the context of Naive Bayes ratio calculation?$

A.

The learning rate

B.

The smoothing parameter

C.

The number of classes

D.

The bias unit

25 $When extracting features for Logistic Regression, if the word 'happy' appears 3 times in a tweet, and 'happy' has a positive frequency of 100 and negative frequency of 5 in the corpus, how is this typically utilized?$

A.

The word is ignored

B.

The counts 100 and 5 contribute to the aggregate sums in the feature vector

C.

The number 3 is the only feature used

D.

The ratio 100/5 is used as the weight

26 $What is the main advantage of Logistic Regression over Naive Bayes?$

A.

It is always faster to train

B.

It does not require independent features

C.

It handles missing data better

D.

It is a generative model

27 $In the context of Naive Bayes, what is V?$

A.

The number of classes

B.

The vocabulary size (number of unique words)

C.

The validation set size

D.

The vector dimension

28 $If the learning rate in Logistic Regression is too large, what might happen?$

A.

The model converges very slowly

B.

The model may overshoot the minimum and fail to converge

C.

The model will always find the global minimum

D.

The cost function becomes 0 immediately

29 $Which of the following is a hyperparameter in Logistic Regression?$

A.

The weight vector θ

B.

The bias term

C.

The learning rate α

D.

The feature vector x

30 $How is the 'Log Prior' calculated for the positive class?$

A.

log(N_pos / N_neg)

B.

log(N_pos / N_total)

C.

log(N_neg / N_pos)

D.

log(V)

31 $What is the range of values for the output of the standard Naive Bayes probability calculation P(y|x) before applying logs?$

A.

[-1, 1]

B.

[0, 1]

C.

(-infinity, +infinity)

D.

[0, 100]

32 $Which sentiment lexicon is purely generated from the training data in the approaches discussed?$

A.

WordNet

B.

The Lambda dictionary (Log Likelihood ratios of words)

C.

SentiWordNet

D.

Google Dictionary

33 $When predicting with Naive Bayes, if a word in the test sentence is not in the training vocabulary (V), what is the standard action?$

A.

Assign it a random probability

B.

Discard it (ignore it)

C.

Halt the program

D.

Re-train the model

34 $What is 'Sentiment Analysis' primarily classifying?$

A.

The topic of the text

B.

The grammatical structure

C.

The emotional tone or opinion (e.g., Positive/Negative)

D.

The language of the text

35 $In Logistic Regression, the decision boundary is:$

A.

Non-linear

B.

Linear

C.

Circular

D.

Polynomial

36 $The denominator for calculating P(w|class) with Laplacian smoothing is:$

A.

Count(w in class) + 1

B.

N_class (total words in class) + V (vocabulary size)

C.

N_class

D.

V

37 $What does a negative value in a word's Log Likelihood (Lambda) score imply in Sentiment Analysis?$

A.

The word is indicative of Positive sentiment

B.

The word is indicative of Negative sentiment

C.

The word is neutral

D.

The word is a stop word

38 $Which of the following is a stop word?$

A.

Terrible

B.

The

C.

Amazing

D.

Love

39 $In a confusion matrix for binary classification, what is a False Positive?$

A.

Correctly predicting positive

B.

Incorrectly predicting positive when the actual class is negative

C.

Incorrectly predicting negative when the actual class is positive

D.

Correctly predicting negative

40 $What is the primary reason for lowercasing text during preprocessing?$

A.

It looks better

B.

To ensure 'Good' and 'good' are treated as the same feature

C.

To remove punctuation

D.

To detect sentence boundaries

41 $If a Logistic Regression model is overfitted, how will it perform?$

A.

Poorly on training data, poorly on test data

B.

Well on training data, well on test data

C.

Well on training data, poorly on test data

D.

Poorly on training data, well on test data

42 $Which formula represents the update rule for weight θ_j in Gradient Descent?$

A.

θ_j := θ_j - α * dJ/dθ_j

B.

θ_j := θ_j + α * dJ/dθ_j

C.

θ_j := θ_j / α

D.

θ_j := dJ/dθ_j

43 $In Naive Bayes, what assumption allows us to multiply probabilities of individual words?$

A.

Conditional Independence

B.

Linear Separability

C.

Normal Distribution

D.

Homoscedasticity

44 $What is Tokenization?$

A.

Splitting a string of text into individual words or terms

B.

Converting words to numbers

C.

Removing special characters

D.

Translating text

45 $Which value of probability corresponds to the logit (log-odds) value of 0?$

A.

0

B.

0.5

C.

1

D.

Infinity

46 $What is the interpretation of weights in Logistic Regression for Sentiment Analysis?$

A.

They are random numbers

B.

They represent the importance and direction (positive/negative) of a feature

C.

They represent the frequency of words

D.

They represent the index of the word

47 $For a balanced dataset, which metric is most straightforward to evaluate performance?$

A.

Accuracy

B.

Recall only

C.

Precision only

D.

Mean Squared Error

48 $In vector space models, what is 'Oov'?$

A.

Out of vocabulary

B.

Out of vector

C.

Over optimization value

D.

Object oriented vector

49 $The conditional probability P(Word|Positive) is conceptually similar to:$

A.

How often the word appears in the whole dataset

B.

How likely the word is to appear if the sentiment is known to be Positive

C.

The probability that the sentiment is Positive given the word

D.

The probability of the word being a stop word

50 $Which technique is essentially a 'probabilistic classifier' based on applying Bayes' theorem?$

A.

K-Nearest Neighbors

B.

Naive Bayes

C.

Decision Trees

D.

K-Means

Unit 4 - Practice Quiz