1In the context of sentiment analysis using Logistic Regression, what is the primary purpose of feature extraction?
A.To increase the length of the sentences
B.To transform raw text into numerical representations like vectors
C.To translate text from one language to another
D.To convert numerical vectors back into text
Correct Answer: To transform raw text into numerical representations like vectors
Explanation:
Machine learning models require numerical input. Feature extraction converts raw text data into numerical vectors that the model can process.
Incorrect! Try again.
2Which function is used in Logistic Regression to map the output to a probability value between 0 and 1?
A.Sigmoid Function
B.Linear Function
C.Tangent Function
D.ReLU Function
Correct Answer: Sigmoid Function
Explanation:
The sigmoid function maps any real-valued number into a value between 0 and 1, making it suitable for probability estimation in binary classification.
Incorrect! Try again.
3When extracting features for sentiment analysis, what does a frequency dictionary typically map?
A.Word length to Sentence length
B.Process ID to Memory usage
C.(Word, Sentiment Label) pairs to the count of occurrences
D.Each word to its synonym
Correct Answer: (Word, Sentiment Label) pairs to the count of occurrences
Explanation:
In simple feature extraction for sentiment analysis, a frequency dictionary maps a pair (word, label) to the number of times that word appears in the corpus associated with that specific label.
Incorrect! Try again.
4In a binary logistic regression classifier for sentiment analysis, if the sigmoid output h(x) >= 0.5, how is the sentiment classified?
A.Undefined
B.Negative
C.Neutral
D.Positive
Correct Answer: Positive
Explanation:
By convention in binary classification, a probability threshold of 0.5 is used. If the output probability is 0.5 or greater, the class is predicted as Positive (1).
Incorrect! Try again.
5What is the formula for the sigmoid function, σ(z)?
A.1 / (1 - e^-z)
B.log(z)
C.1 / (1 + e^-z)
D.e^z / (1 + e^z)
Correct Answer: 1 / (1 + e^-z)
Explanation:
The standard formula for the sigmoid logistic function is 1 / (1 + e^-z).
Incorrect! Try again.
6In the feature vector representation X = [1, sum_pos, sum_neg], what does the '1' usually represent?
A.The classification threshold
B.The learning rate
C.The bias unit (intercept term)
D.The count of the first word
Correct Answer: The bias unit (intercept term)
Explanation:
The '1' is added to the feature vector to correspond to the bias weight (theta_0) in the dot product calculation.
Incorrect! Try again.
7Which preprocessing step is commonly performed before feature extraction to reduce the vocabulary size without losing semantic meaning?
A.Stemming and removing stop words
B.Duplicating sentences
C.Removing all vowels
D.Capitalizing all letters
Correct Answer: Stemming and removing stop words
Explanation:
Stemming reduces words to their root form (e.g., 'tuning' to 'tun'), and removing stop words eliminates common non-informative words, both reducing vocabulary size.
Incorrect! Try again.
8What is the 'Cost Function' used in Logistic Regression generally called?
A.Cross-Entropy Loss (Log Loss)
B.Mean Squared Error
C.Absolute Error
D.Hinge Loss
Correct Answer: Cross-Entropy Loss (Log Loss)
Explanation:
Logistic regression uses the Cross-Entropy (or Log Loss) function because it is convex, allowing gradient descent to find the global minimum.
Incorrect! Try again.
9What is the goal of Gradient Descent in training a Logistic Regression model?
A.To set all weights to zero
B.To maximize the cost function
C.To remove the bias term
D.To minimize the cost function by iteratively updating weights
Correct Answer: To minimize the cost function by iteratively updating weights
Explanation:
Gradient Descent is an optimization algorithm used to minimize the cost function by adjusting parameters (weights) in the opposite direction of the gradient.
Incorrect! Try again.
10In Bayes' Rule, what does P(A|B) represent?
A.The prior probability of A
B.The probability of B given A
C.The posterior probability of A given B
D.The joint probability of A and B
Correct Answer: The posterior probability of A given B
Explanation:
P(A|B) is the conditional probability of event A occurring given that event B is true, often called the posterior.
Incorrect! Try again.
11Why is the Naive Bayes classifier called 'Naive'?
A.It requires very little training data
B.It cannot handle complex text
C.It was developed by a naive mathematician
D.It assumes that features are independent of each other given the class
Correct Answer: It assumes that features are independent of each other given the class
Explanation:
The 'naive' assumption is that the occurrence of a particular word is independent of the occurrence of other words, given the sentiment label.
Incorrect! Try again.
12What is the formula for Bayes' Theorem?
A.P(A|B) = P(A) * P(B)
B.P(A|B) = P(B|A) * P(A) / P(B)
C.P(A|B) = P(A) / P(B)
D.P(A|B) = P(B|A) + P(A)
Correct Answer: P(A|B) = P(B|A) * P(A) / P(B)
Explanation:
Bayes' theorem states that the posterior probability is the likelihood times the prior divided by the evidence.
Incorrect! Try again.
13In Naive Bayes Sentiment Analysis, what is 'Laplacian Smoothing' used for?
A.To handle words with zero probability (words not seen in training)
B.To average the sentiment scores
C.To remove stop words
D.To smooth the decision boundary
Correct Answer: To handle words with zero probability (words not seen in training)
Explanation:
Laplacian smoothing adds a small count (usually 1) to all frequency counts to prevent the probability of an unseen word becoming zero, which would zero out the entire calculation.
Incorrect! Try again.
14If a word appears in the positive corpus but not the negative corpus, what happens to its probability P(W|Negative) without smoothing?
A.It becomes 0
B.It becomes 0.5
C.It becomes 1
D.It becomes infinity
Correct Answer: It becomes 0
Explanation:
Without smoothing, the frequency count is 0, so the probability calculation results in 0.
Incorrect! Try again.
15In Logistic Regression, if the dot product θ^T * x is 0, what is the output of the sigmoid function?
A.0
B.0.5
C.undefined
D.1
Correct Answer: 0.5
Explanation:
e^0 is 1. The sigmoid function is 1/(1+1) = 0.5. This is typically the decision boundary.
Incorrect! Try again.
16Which of the following describes the 'Prior Probability' P(Positive) in Naive Bayes?
A.Probability of a document being positive based on the training set distribution
B.Probability of a document being positive given a specific word
C.Total number of words in the dictionary
D.Probability of a word being positive
Correct Answer: Probability of a document being positive based on the training set distribution
Explanation:
The prior P(Positive) is the ratio of the number of positive documents to the total number of documents in the training set.
Incorrect! Try again.
17Why do we typically use Log Likelihood in Naive Bayes calculations instead of raw probabilities?
A.To prevent numerical underflow from multiplying many small probabilities
B.To convert negative numbers to positive
C.To make the calculation harder
D.Because logarithms are faster to compute than addition
Correct Answer: To prevent numerical underflow from multiplying many small probabilities
Explanation:
Multiplying many small probabilities (between 0 and 1) results in extremely small numbers that computers process as zero (underflow). Using logs converts multiplication to addition.
Incorrect! Try again.
18In the Naive Bayes inference formula, if the sum of the Log Prior and Log Likelihoods is greater than 0, the sentiment is classified as:
A.Positive
B.Neutral
C.Ambiguous
D.Negative
Correct Answer: Positive
Explanation:
The Log Likelihood ratio is typically defined as log(P(W|Pos)/P(W|Neg)). If the total sum including the log prior is > 0, the positive probability outweighs the negative.
Incorrect! Try again.
19What is a 'sparse representation' in the context of NLP feature vectors?
A.A vector with a small dimension
B.A vector with mostly non-zero values
C.A vector where most elements are zero
D.A vector containing only negative numbers
Correct Answer: A vector where most elements are zero
Explanation:
When using One-Hot Encoding or large vocabulary counts, most words in the vocabulary do not appear in a single sentence, resulting in a vector mostly filled with zeros.
Incorrect! Try again.
20Which component of the Naive Bayes classifier represents the 'Evidence' in Bayes' rule?
A.P(Data)
B.P(Data | Class)
C.P(Class | Data)
D.P(Class)
Correct Answer: P(Data)
Explanation:
The denominator P(Data) represents the evidence or marginal likelihood. In classification, it is often ignored as it is constant for all classes.
Incorrect! Try again.
21In Logistic Regression, what is the dimension of the weight vector θ if the feature vector x has dimension V+1?
A.1
B.V*2
C.V+1
D.V
Correct Answer: V+1
Explanation:
To perform the dot product θ^T * x, the weight vector must have the same dimensions as the feature vector.
Incorrect! Try again.
22Which algorithm is considered a 'Generative' model?
A.Naive Bayes
B.Perceptron
C.Logistic Regression
D.Support Vector Machine
Correct Answer: Naive Bayes
Explanation:
Naive Bayes is a generative model because it models the joint probability distribution P(x, y) (how the data is generated), whereas Logistic Regression is discriminative.
Incorrect! Try again.
23Which algorithm is considered a 'Discriminative' model?
A.Logistic Regression
B.Hidden Markov Model
C.Naive Bayes
D.Gaussian Mixture Model
Correct Answer: Logistic Regression
Explanation:
Logistic Regression is discriminative because it directly models the conditional probability P(y|x) (the boundary between classes).
Incorrect! Try again.
24What is the 'Lambda' (λ) term in the context of Naive Bayes ratio calculation?
A.The learning rate
B.The smoothing parameter
C.The bias unit
D.The number of classes
Correct Answer: The smoothing parameter
Explanation:
Lambda is the additive smoothing parameter (Laplacian smoothing) added to the numerator and denominator of probability calculations.
Incorrect! Try again.
25When extracting features for Logistic Regression, if the word 'happy' appears 3 times in a tweet, and 'happy' has a positive frequency of 100 and negative frequency of 5 in the corpus, how is this typically utilized?
A.The counts 100 and 5 contribute to the aggregate sums in the feature vector
B.The word is ignored
C.The number 3 is the only feature used
D.The ratio 100/5 is used as the weight
Correct Answer: The counts 100 and 5 contribute to the aggregate sums in the feature vector
Explanation:
In standard frequency-based feature extraction, we look up the pre-computed corpus frequencies (100, 5) and sum them into the feature vector slots for the tweet.
Incorrect! Try again.
26What is the main advantage of Logistic Regression over Naive Bayes?
A.It is a generative model
B.It does not require independent features
C.It handles missing data better
D.It is always faster to train
Correct Answer: It does not require independent features
Explanation:
Logistic Regression learns the weights of features based on their correlation with the output and does not strictly assume feature independence like Naive Bayes.
Incorrect! Try again.
27In the context of Naive Bayes, what is V?
A.The vector dimension
B.The vocabulary size (number of unique words)
C.The validation set size
D.The number of classes
Correct Answer: The vocabulary size (number of unique words)
Explanation:
V usually denotes the size of the vocabulary, which is used in the denominator of the smoothed probability formula.
Incorrect! Try again.
28If the learning rate in Logistic Regression is too large, what might happen?
A.The model will always find the global minimum
B.The model may overshoot the minimum and fail to converge
C.The cost function becomes 0 immediately
D.The model converges very slowly
Correct Answer: The model may overshoot the minimum and fail to converge
Explanation:
A large learning rate causes the gradient descent steps to be too big, potentially bouncing over the minimum cost and diverging.
Incorrect! Try again.
29Which of the following is a hyperparameter in Logistic Regression?
A.The weight vector θ
B.The feature vector x
C.The bias term
D.The learning rate α
Correct Answer: The learning rate α
Explanation:
The learning rate is set before training begins and controls the step size; weights and bias are parameters learned during training.
Incorrect! Try again.
30How is the 'Log Prior' calculated for the positive class?
A.log(N_neg / N_pos)
B.log(N_pos / N_neg)
C.log(N_pos / N_total)
D.log(V)
Correct Answer: log(N_pos / N_neg)
Explanation:
In the log-likelihood ratio formulation of Naive Bayes, the log prior is the log of the ratio of the number of positive documents to negative documents.
Incorrect! Try again.
31What is the range of values for the output of the standard Naive Bayes probability calculation P(y|x) before applying logs?
A.[0, 100]
B.[0, 1]
C.(-infinity, +infinity)
D.[-1, 1]
Correct Answer: [0, 1]
Explanation:
Probabilities are always between 0 and 1.
Incorrect! Try again.
32Which sentiment lexicon is purely generated from the training data in the approaches discussed?
A.SentiWordNet
B.The Lambda dictionary (Log Likelihood ratios of words)
C.Google Dictionary
D.WordNet
Correct Answer: The Lambda dictionary (Log Likelihood ratios of words)
Explanation:
In the Naive Bayes approach, we build a dictionary of lambda values (log likelihood ratios) for each word based specifically on the frequencies observed in the training corpus.
Incorrect! Try again.
33When predicting with Naive Bayes, if a word in the test sentence is not in the training vocabulary (V), what is the standard action?
A.Re-train the model
B.Discard it (ignore it)
C.Assign it a random probability
D.Halt the program
Correct Answer: Discard it (ignore it)
Explanation:
Words not in the vocabulary do not have a computed likelihood ratio and typically contribute 0 to the log-sum score (i.e., they are ignored).
Incorrect! Try again.
34What is 'Sentiment Analysis' primarily classifying?
A.The language of the text
B.The topic of the text
C.The grammatical structure
D.The emotional tone or opinion (e.g., Positive/Negative)
Correct Answer: The emotional tone or opinion (e.g., Positive/Negative)
Explanation:
Sentiment analysis is the computational study of opinions, sentiments, and emotions expressed in text.
Incorrect! Try again.
35In Logistic Regression, the decision boundary is:
A.Polynomial
B.Circular
C.Non-linear
D.Linear
Correct Answer: Linear
Explanation:
Logistic regression creates a linear decision boundary (a line or hyperplane) that separates the classes.
Incorrect! Try again.
36The denominator for calculating P(w|class) with Laplacian smoothing is:
A.V
B.Count(w in class) + 1
C.N_class (total words in class) + V (vocabulary size)
D.N_class
Correct Answer: N_class (total words in class) + V (vocabulary size)
Explanation:
To normalize the smoothed counts, we divide by the total number of words in the class plus the size of the vocabulary (since we added 1 for every unique word).
Incorrect! Try again.
37What does a negative value in a word's Log Likelihood (Lambda) score imply in Sentiment Analysis?
A.The word is indicative of Positive sentiment
B.The word is a stop word
C.The word is neutral
D.The word is indicative of Negative sentiment
Correct Answer: The word is indicative of Negative sentiment
Explanation:
Lambda is log(P(w|Pos)/P(w|Neg)). If P(w|Neg) > P(w|Pos), the ratio is < 1, and the log is negative.
Incorrect! Try again.
38Which of the following is a stop word?
A.The
B.Love
C.Amazing
D.Terrible
Correct Answer: The
Explanation:
'The' is a common article with little semantic sentiment value, classifying it as a stop word.
Incorrect! Try again.
39In a confusion matrix for binary classification, what is a False Positive?
A.Incorrectly predicting positive when the actual class is negative
B.Correctly predicting negative
C.Correctly predicting positive
D.Incorrectly predicting negative when the actual class is positive
Correct Answer: Incorrectly predicting positive when the actual class is negative
Explanation:
A False Positive is an error where the model predicts the positive class, but the ground truth was actually negative.
Incorrect! Try again.
40What is the primary reason for lowercasing text during preprocessing?
A.To remove punctuation
B.It looks better
C.To ensure 'Good' and 'good' are treated as the same feature
D.To detect sentence boundaries
Correct Answer: To ensure 'Good' and 'good' are treated as the same feature
Explanation:
Lowercasing normalizes the text so that capitalization variations do not create duplicate features in the vocabulary.
Incorrect! Try again.
41If a Logistic Regression model is overfitted, how will it perform?
A.Well on training data, well on test data
B.Poorly on training data, poorly on test data
C.Poorly on training data, well on test data
D.Well on training data, poorly on test data
Correct Answer: Well on training data, poorly on test data
Explanation:
Overfitting occurs when the model learns the noise in the training data, resulting in high accuracy on training but poor generalization to unseen test data.
Incorrect! Try again.
42Which formula represents the update rule for weight θ_j in Gradient Descent?
A.θ_j := θ_j / α
B.θ_j := θ_j - α * dJ/dθ_j
C.θ_j := θ_j + α * dJ/dθ_j
D.θ_j := dJ/dθ_j
Correct Answer: θ_j := θ_j - α * dJ/dθ_j
Explanation:
We subtract the product of the learning rate and the partial derivative of the cost function to move down the gradient.
Incorrect! Try again.
43In Naive Bayes, what assumption allows us to multiply probabilities of individual words?
A.Linear Separability
B.Conditional Independence
C.Normal Distribution
D.Homoscedasticity
Correct Answer: Conditional Independence
Explanation:
The Naive Bayes assumption is that the probability of feature xi depends only on the class y, independent of other features xj.
Incorrect! Try again.
44What is Tokenization?
A.Splitting a string of text into individual words or terms
B.Translating text
C.Converting words to numbers
D.Removing special characters
Correct Answer: Splitting a string of text into individual words or terms
Explanation:
Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens.
Incorrect! Try again.
45Which value of probability corresponds to the logit (log-odds) value of 0?
A.0
B.0.5
C.1
D.Infinity
Correct Answer: 0.5
Explanation:
Log-odds = log(p / (1-p)). If p=0.5, then 0.5/0.5 = 1, and log(1) = 0.
Incorrect! Try again.
46What is the interpretation of weights in Logistic Regression for Sentiment Analysis?
A.They represent the frequency of words
B.They represent the importance and direction (positive/negative) of a feature
C.They represent the index of the word
D.They are random numbers
Correct Answer: They represent the importance and direction (positive/negative) of a feature
Explanation:
A high positive weight implies the feature strongly correlates with the positive class; a high negative weight implies correlation with the negative class.
Incorrect! Try again.
47For a balanced dataset, which metric is most straightforward to evaluate performance?
A.Recall only
B.Accuracy
C.Precision only
D.Mean Squared Error
Correct Answer: Accuracy
Explanation:
Accuracy (Correct Predictions / Total Predictions) is a good metric when the classes in the dataset are balanced (e.g., 50% positive, 50% negative).
Incorrect! Try again.
48In vector space models, what is 'Oov'?
A.Over optimization value
B.Object oriented vector
C.Out of vector
D.Out of vocabulary
Correct Answer: Out of vocabulary
Explanation:
OOV stands for Out-Of-Vocabulary, referring to words encountered in testing that were not present in the training vocabulary.
Incorrect! Try again.
49The conditional probability P(Word|Positive) is conceptually similar to:
A.How often the word appears in the whole dataset
B.The probability that the sentiment is Positive given the word
C.How likely the word is to appear if the sentiment is known to be Positive
D.The probability of the word being a stop word
Correct Answer: How likely the word is to appear if the sentiment is known to be Positive
Explanation:
This is the likelihood: given the class is Positive, what is the probability of observing this specific word.
Incorrect! Try again.
50Which technique is essentially a 'probabilistic classifier' based on applying Bayes' theorem?
A.K-Nearest Neighbors
B.Decision Trees
C.K-Means
D.Naive Bayes
Correct Answer: Naive Bayes
Explanation:
Naive Bayes is explicitly founded on Bayes' theorem to calculate probabilities for classification.