1In the context of sentiment analysis using Logistic Regression, what is the primary purpose of feature extraction?
A.To convert numerical vectors back into text
B.To transform raw text into numerical representations like vectors
C.To increase the length of the sentences
D.To translate text from one language to another
Correct Answer: To transform raw text into numerical representations like vectors
Explanation:Machine learning models require numerical input. Feature extraction converts raw text data into numerical vectors that the model can process.
Incorrect! Try again.
2Which function is used in Logistic Regression to map the output to a probability value between 0 and 1?
A.ReLU Function
B.Sigmoid Function
C.Tangent Function
D.Linear Function
Correct Answer: Sigmoid Function
Explanation:The sigmoid function maps any real-valued number into a value between 0 and 1, making it suitable for probability estimation in binary classification.
Incorrect! Try again.
3When extracting features for sentiment analysis, what does a frequency dictionary typically map?
A.Process ID to Memory usage
B.(Word, Sentiment Label) pairs to the count of occurrences
C.Word length to Sentence length
D.Each word to its synonym
Correct Answer: (Word, Sentiment Label) pairs to the count of occurrences
Explanation:In simple feature extraction for sentiment analysis, a frequency dictionary maps a pair (word, label) to the number of times that word appears in the corpus associated with that specific label.
Incorrect! Try again.
4In a binary logistic regression classifier for sentiment analysis, if the sigmoid output h(x) >= 0.5, how is the sentiment classified?
A.Negative
B.Positive
C.Neutral
D.Undefined
Correct Answer: Positive
Explanation:By convention in binary classification, a probability threshold of 0.5 is used. If the output probability is 0.5 or greater, the class is predicted as Positive (1).
Incorrect! Try again.
5What is the formula for the sigmoid function, σ(z)?
A.1 / (1 + e^-z)
B.1 / (1 - e^-z)
C.e^z / (1 + e^z)
D.log(z)
Correct Answer: 1 / (1 + e^-z)
Explanation:The standard formula for the sigmoid logistic function is 1 / (1 + e^-z).
Incorrect! Try again.
6In the feature vector representation X = [1, sum_pos, sum_neg], what does the '1' usually represent?
A.The count of the first word
B.The bias unit (intercept term)
C.The learning rate
D.The classification threshold
Correct Answer: The bias unit (intercept term)
Explanation:The '1' is added to the feature vector to correspond to the bias weight (theta_0) in the dot product calculation.
Incorrect! Try again.
7Which preprocessing step is commonly performed before feature extraction to reduce the vocabulary size without losing semantic meaning?
A.Duplicating sentences
B.Stemming and removing stop words
C.Capitalizing all letters
D.Removing all vowels
Correct Answer: Stemming and removing stop words
Explanation:Stemming reduces words to their root form (e.g., 'tuning' to 'tun'), and removing stop words eliminates common non-informative words, both reducing vocabulary size.
Incorrect! Try again.
8What is the 'Cost Function' used in Logistic Regression generally called?
A.Mean Squared Error
B.Cross-Entropy Loss (Log Loss)
C.Hinge Loss
D.Absolute Error
Correct Answer: Cross-Entropy Loss (Log Loss)
Explanation:Logistic regression uses the Cross-Entropy (or Log Loss) function because it is convex, allowing gradient descent to find the global minimum.
Incorrect! Try again.
9What is the goal of Gradient Descent in training a Logistic Regression model?
A.To maximize the cost function
B.To minimize the cost function by iteratively updating weights
C.To set all weights to zero
D.To remove the bias term
Correct Answer: To minimize the cost function by iteratively updating weights
Explanation:Gradient Descent is an optimization algorithm used to minimize the cost function by adjusting parameters (weights) in the opposite direction of the gradient.
Incorrect! Try again.
10In Bayes' Rule, what does P(A|B) represent?
A.The joint probability of A and B
B.The probability of B given A
C.The posterior probability of A given B
D.The prior probability of A
Correct Answer: The posterior probability of A given B
Explanation:P(A|B) is the conditional probability of event A occurring given that event B is true, often called the posterior.
Incorrect! Try again.
11Why is the Naive Bayes classifier called 'Naive'?
A.It requires very little training data
B.It assumes that features are independent of each other given the class
C.It was developed by a naive mathematician
D.It cannot handle complex text
Correct Answer: It assumes that features are independent of each other given the class
Explanation:The 'naive' assumption is that the occurrence of a particular word is independent of the occurrence of other words, given the sentiment label.
Incorrect! Try again.
12What is the formula for Bayes' Theorem?
A.P(A|B) = P(B|A) * P(A) / P(B)
B.P(A|B) = P(A) * P(B)
C.P(A|B) = P(B|A) + P(A)
D.P(A|B) = P(A) / P(B)
Correct Answer: P(A|B) = P(B|A) * P(A) / P(B)
Explanation:Bayes' theorem states that the posterior probability is the likelihood times the prior divided by the evidence.
Incorrect! Try again.
13In Naive Bayes Sentiment Analysis, what is 'Laplacian Smoothing' used for?
A.To remove stop words
B.To handle words with zero probability (words not seen in training)
C.To smooth the decision boundary
D.To average the sentiment scores
Correct Answer: To handle words with zero probability (words not seen in training)
Explanation:Laplacian smoothing adds a small count (usually 1) to all frequency counts to prevent the probability of an unseen word becoming zero, which would zero out the entire calculation.
Incorrect! Try again.
14If a word appears in the positive corpus but not the negative corpus, what happens to its probability P(W|Negative) without smoothing?
A.It becomes 1
B.It becomes 0.5
C.It becomes 0
D.It becomes infinity
Correct Answer: It becomes 0
Explanation:Without smoothing, the frequency count is 0, so the probability calculation results in 0.
Incorrect! Try again.
15In Logistic Regression, if the dot product θ^T * x is 0, what is the output of the sigmoid function?
A.
B.1
C.0.5
D.undefined
Correct Answer: 0.5
Explanation:e^0 is 1. The sigmoid function is 1/(1+1) = 0.5. This is typically the decision boundary.
Incorrect! Try again.
16Which of the following describes the 'Prior Probability' P(Positive) in Naive Bayes?
A.Probability of a word being positive
B.Probability of a document being positive based on the training set distribution
C.Probability of a document being positive given a specific word
D.Total number of words in the dictionary
Correct Answer: Probability of a document being positive based on the training set distribution
Explanation:The prior P(Positive) is the ratio of the number of positive documents to the total number of documents in the training set.
Incorrect! Try again.
17Why do we typically use Log Likelihood in Naive Bayes calculations instead of raw probabilities?
A.To convert negative numbers to positive
B.To prevent numerical underflow from multiplying many small probabilities
C.To make the calculation harder
D.Because logarithms are faster to compute than addition
Correct Answer: To prevent numerical underflow from multiplying many small probabilities
Explanation:Multiplying many small probabilities (between 0 and 1) results in extremely small numbers that computers process as zero (underflow). Using logs converts multiplication to addition.
Incorrect! Try again.
18In the Naive Bayes inference formula, if the sum of the Log Prior and Log Likelihoods is greater than 0, the sentiment is classified as:
A.Negative
B.Neutral
C.Positive
D.Ambiguous
Correct Answer: Positive
Explanation:The Log Likelihood ratio is typically defined as log(P(W|Pos)/P(W|Neg)). If the total sum including the log prior is > 0, the positive probability outweighs the negative.
Incorrect! Try again.
19What is a 'sparse representation' in the context of NLP feature vectors?
A.A vector with mostly non-zero values
B.A vector where most elements are zero
C.A vector with a small dimension
D.A vector containing only negative numbers
Correct Answer: A vector where most elements are zero
Explanation:When using One-Hot Encoding or large vocabulary counts, most words in the vocabulary do not appear in a single sentence, resulting in a vector mostly filled with zeros.
Incorrect! Try again.
20Which component of the Naive Bayes classifier represents the 'Evidence' in Bayes' rule?
A.P(Class)
B.P(Data | Class)
C.P(Data)
D.P(Class | Data)
Correct Answer: P(Data)
Explanation:The denominator P(Data) represents the evidence or marginal likelihood. In classification, it is often ignored as it is constant for all classes.
Incorrect! Try again.
21In Logistic Regression, what is the dimension of the weight vector θ if the feature vector x has dimension V+1?
A.V
B.1
C.V+1
D.V*2
Correct Answer: V+1
Explanation:To perform the dot product θ^T * x, the weight vector must have the same dimensions as the feature vector.
Incorrect! Try again.
22Which algorithm is considered a 'Generative' model?
A.Logistic Regression
B.Naive Bayes
C.Support Vector Machine
D.Perceptron
Correct Answer: Naive Bayes
Explanation:Naive Bayes is a generative model because it models the joint probability distribution P(x, y) (how the data is generated), whereas Logistic Regression is discriminative.
Incorrect! Try again.
23Which algorithm is considered a 'Discriminative' model?
A.Naive Bayes
B.Logistic Regression
C.Hidden Markov Model
D.Gaussian Mixture Model
Correct Answer: Logistic Regression
Explanation:Logistic Regression is discriminative because it directly models the conditional probability P(y|x) (the boundary between classes).
Incorrect! Try again.
24What is the 'Lambda' (λ) term in the context of Naive Bayes ratio calculation?
A.The learning rate
B.The smoothing parameter
C.The number of classes
D.The bias unit
Correct Answer: The smoothing parameter
Explanation:Lambda is the additive smoothing parameter (Laplacian smoothing) added to the numerator and denominator of probability calculations.
Incorrect! Try again.
25When extracting features for Logistic Regression, if the word 'happy' appears 3 times in a tweet, and 'happy' has a positive frequency of 100 and negative frequency of 5 in the corpus, how is this typically utilized?
A.The word is ignored
B.The counts 100 and 5 contribute to the aggregate sums in the feature vector
C.The number 3 is the only feature used
D.The ratio 100/5 is used as the weight
Correct Answer: The counts 100 and 5 contribute to the aggregate sums in the feature vector
Explanation:In standard frequency-based feature extraction, we look up the pre-computed corpus frequencies (100, 5) and sum them into the feature vector slots for the tweet.
Incorrect! Try again.
26What is the main advantage of Logistic Regression over Naive Bayes?
A.It is always faster to train
B.It does not require independent features
C.It handles missing data better
D.It is a generative model
Correct Answer: It does not require independent features
Explanation:Logistic Regression learns the weights of features based on their correlation with the output and does not strictly assume feature independence like Naive Bayes.
Incorrect! Try again.
27In the context of Naive Bayes, what is V?
A.The number of classes
B.The vocabulary size (number of unique words)
C.The validation set size
D.The vector dimension
Correct Answer: The vocabulary size (number of unique words)
Explanation:V usually denotes the size of the vocabulary, which is used in the denominator of the smoothed probability formula.
Incorrect! Try again.
28If the learning rate in Logistic Regression is too large, what might happen?
A.The model converges very slowly
B.The model may overshoot the minimum and fail to converge
C.The model will always find the global minimum
D.The cost function becomes 0 immediately
Correct Answer: The model may overshoot the minimum and fail to converge
Explanation:A large learning rate causes the gradient descent steps to be too big, potentially bouncing over the minimum cost and diverging.
Incorrect! Try again.
29Which of the following is a hyperparameter in Logistic Regression?
A.The weight vector θ
B.The bias term
C.The learning rate α
D.The feature vector x
Correct Answer: The learning rate α
Explanation:The learning rate is set before training begins and controls the step size; weights and bias are parameters learned during training.
Incorrect! Try again.
30How is the 'Log Prior' calculated for the positive class?
A.log(N_pos / N_neg)
B.log(N_pos / N_total)
C.log(N_neg / N_pos)
D.log(V)
Correct Answer: log(N_pos / N_neg)
Explanation:In the log-likelihood ratio formulation of Naive Bayes, the log prior is the log of the ratio of the number of positive documents to negative documents.
Incorrect! Try again.
31What is the range of values for the output of the standard Naive Bayes probability calculation P(y|x) before applying logs?
A.[-1, 1]
B.[0, 1]
C.(-infinity, +infinity)
D.[0, 100]
Correct Answer: [0, 1]
Explanation:Probabilities are always between 0 and 1.
Incorrect! Try again.
32Which sentiment lexicon is purely generated from the training data in the approaches discussed?
A.WordNet
B.The Lambda dictionary (Log Likelihood ratios of words)
C.SentiWordNet
D.Google Dictionary
Correct Answer: The Lambda dictionary (Log Likelihood ratios of words)
Explanation:In the Naive Bayes approach, we build a dictionary of lambda values (log likelihood ratios) for each word based specifically on the frequencies observed in the training corpus.
Incorrect! Try again.
33When predicting with Naive Bayes, if a word in the test sentence is not in the training vocabulary (V), what is the standard action?
A.Assign it a random probability
B.Discard it (ignore it)
C.Halt the program
D.Re-train the model
Correct Answer: Discard it (ignore it)
Explanation:Words not in the vocabulary do not have a computed likelihood ratio and typically contribute 0 to the log-sum score (i.e., they are ignored).
Incorrect! Try again.
34What is 'Sentiment Analysis' primarily classifying?
A.The topic of the text
B.The grammatical structure
C.The emotional tone or opinion (e.g., Positive/Negative)
D.The language of the text
Correct Answer: The emotional tone or opinion (e.g., Positive/Negative)
Explanation:Sentiment analysis is the computational study of opinions, sentiments, and emotions expressed in text.
Incorrect! Try again.
35In Logistic Regression, the decision boundary is:
A.Non-linear
B.Linear
C.Circular
D.Polynomial
Correct Answer: Linear
Explanation:Logistic regression creates a linear decision boundary (a line or hyperplane) that separates the classes.
Incorrect! Try again.
36The denominator for calculating P(w|class) with Laplacian smoothing is:
A.Count(w in class) + 1
B.N_class (total words in class) + V (vocabulary size)
C.N_class
D.V
Correct Answer: N_class (total words in class) + V (vocabulary size)
Explanation:To normalize the smoothed counts, we divide by the total number of words in the class plus the size of the vocabulary (since we added 1 for every unique word).
Incorrect! Try again.
37What does a negative value in a word's Log Likelihood (Lambda) score imply in Sentiment Analysis?
A.The word is indicative of Positive sentiment
B.The word is indicative of Negative sentiment
C.The word is neutral
D.The word is a stop word
Correct Answer: The word is indicative of Negative sentiment
Explanation:Lambda is log(P(w|Pos)/P(w|Neg)). If P(w|Neg) > P(w|Pos), the ratio is < 1, and the log is negative.
Incorrect! Try again.
38Which of the following is a stop word?
A.Terrible
B.The
C.Amazing
D.Love
Correct Answer: The
Explanation:'The' is a common article with little semantic sentiment value, classifying it as a stop word.
Incorrect! Try again.
39In a confusion matrix for binary classification, what is a False Positive?
A.Correctly predicting positive
B.Incorrectly predicting positive when the actual class is negative
C.Incorrectly predicting negative when the actual class is positive
D.Correctly predicting negative
Correct Answer: Incorrectly predicting positive when the actual class is negative
Explanation:A False Positive is an error where the model predicts the positive class, but the ground truth was actually negative.
Incorrect! Try again.
40What is the primary reason for lowercasing text during preprocessing?
A.It looks better
B.To ensure 'Good' and 'good' are treated as the same feature
C.To remove punctuation
D.To detect sentence boundaries
Correct Answer: To ensure 'Good' and 'good' are treated as the same feature
Explanation:Lowercasing normalizes the text so that capitalization variations do not create duplicate features in the vocabulary.
Incorrect! Try again.
41If a Logistic Regression model is overfitted, how will it perform?
A.Poorly on training data, poorly on test data
B.Well on training data, well on test data
C.Well on training data, poorly on test data
D.Poorly on training data, well on test data
Correct Answer: Well on training data, poorly on test data
Explanation:Overfitting occurs when the model learns the noise in the training data, resulting in high accuracy on training but poor generalization to unseen test data.
Incorrect! Try again.
42Which formula represents the update rule for weight θ_j in Gradient Descent?
A.θ_j := θ_j - α * dJ/dθ_j
B.θ_j := θ_j + α * dJ/dθ_j
C.θ_j := θ_j / α
D.θ_j := dJ/dθ_j
Correct Answer: θ_j := θ_j - α * dJ/dθ_j
Explanation:We subtract the product of the learning rate and the partial derivative of the cost function to move down the gradient.
Incorrect! Try again.
43In Naive Bayes, what assumption allows us to multiply probabilities of individual words?
A.Conditional Independence
B.Linear Separability
C.Normal Distribution
D.Homoscedasticity
Correct Answer: Conditional Independence
Explanation:The Naive Bayes assumption is that the probability of feature xi depends only on the class y, independent of other features xj.
Incorrect! Try again.
44What is Tokenization?
A.Splitting a string of text into individual words or terms
B.Converting words to numbers
C.Removing special characters
D.Translating text
Correct Answer: Splitting a string of text into individual words or terms
Explanation:Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens.
Incorrect! Try again.
45Which value of probability corresponds to the logit (log-odds) value of 0?
A.
B.0.5
C.1
D.Infinity
Correct Answer: 0.5
Explanation:Log-odds = log(p / (1-p)). If p=0.5, then 0.5/0.5 = 1, and log(1) = 0.
Incorrect! Try again.
46What is the interpretation of weights in Logistic Regression for Sentiment Analysis?
A.They are random numbers
B.They represent the importance and direction (positive/negative) of a feature
C.They represent the frequency of words
D.They represent the index of the word
Correct Answer: They represent the importance and direction (positive/negative) of a feature
Explanation:A high positive weight implies the feature strongly correlates with the positive class; a high negative weight implies correlation with the negative class.
Incorrect! Try again.
47For a balanced dataset, which metric is most straightforward to evaluate performance?
A.Accuracy
B.Recall only
C.Precision only
D.Mean Squared Error
Correct Answer: Accuracy
Explanation:Accuracy (Correct Predictions / Total Predictions) is a good metric when the classes in the dataset are balanced (e.g., 50% positive, 50% negative).
Incorrect! Try again.
48In vector space models, what is 'Oov'?
A.Out of vocabulary
B.Out of vector
C.Over optimization value
D.Object oriented vector
Correct Answer: Out of vocabulary
Explanation:OOV stands for Out-Of-Vocabulary, referring to words encountered in testing that were not present in the training vocabulary.
Incorrect! Try again.
49The conditional probability P(Word|Positive) is conceptually similar to:
A.How often the word appears in the whole dataset
B.How likely the word is to appear if the sentiment is known to be Positive
C.The probability that the sentiment is Positive given the word
D.The probability of the word being a stop word
Correct Answer: How likely the word is to appear if the sentiment is known to be Positive
Explanation:This is the likelihood: given the class is Positive, what is the probability of observing this specific word.
Incorrect! Try again.
50Which technique is essentially a 'probabilistic classifier' based on applying Bayes' theorem?
A.K-Nearest Neighbors
B.Naive Bayes
C.Decision Trees
D.K-Means
Correct Answer: Naive Bayes
Explanation:Naive Bayes is explicitly founded on Bayes' theorem to calculate probabilities for classification.
Incorrect! Try again.
Give Feedback
Help us improve by sharing your thoughts or reporting issues.