1What is the primary goal of Predictive Analytics?
A.To describe what happened in the past
B.To predict future outcomes based on historical data
C.To prescribe the best course of action
D.To store large amounts of data
Correct Answer: To predict future outcomes based on historical data
Explanation:Predictive analytics uses historical data and statistical algorithms to identify the likelihood of future outcomes.
Incorrect! Try again.
2Which of the following is NOT a phase in the standard predictive analytics lifecycle?
A.Data Preparation
B.Model Building
C.Hardware Manufacturing
D.Deployment
Correct Answer: Hardware Manufacturing
Explanation:Hardware manufacturing is related to production, not the analytics lifecycle. The lifecycle typically includes problem definition, data prep, modeling, and deployment.
Incorrect! Try again.
3Descriptive Analytics differs from Predictive Analytics because Descriptive Analytics focuses on:
A.Forecasting future trends
B.Optimizing decision making
C.Summarizing past events
D.Creating machine learning models
Correct Answer: Summarizing past events
Explanation:Descriptive analytics answers 'What happened?' by summarizing past data, whereas predictive analytics answers 'What could happen?'.
Incorrect! Try again.
4Which of the following is a common application of Predictive Analytics?
A.Credit scoring
B.Generating an annual report
C.Real-time operating system scheduling
D.Creating a database schema
Correct Answer: Credit scoring
Explanation:Credit scoring predicts the likelihood of a borrower defaulting on a loan based on their history.
Incorrect! Try again.
5In the context of Machine Learning, what is 'Training Data'?
A.Data used to evaluate the final model
B.Data used to teach the algorithm patterns
C.Data that has been corrupted
D.Future data that has not occurred yet
Correct Answer: Data used to teach the algorithm patterns
Explanation:Training data is the dataset used to fit the model parameters, allowing the algorithm to learn relationships.
Incorrect! Try again.
6Which type of analytics answers the question 'What should we do about it'?
A.Descriptive Analytics
B.Diagnostic Analytics
C.Predictive Analytics
D.Prescriptive Analytics
Correct Answer: Prescriptive Analytics
Explanation:Prescriptive analytics suggests decision options to take advantage of a future opportunity or mitigate a future risk.
Incorrect! Try again.
7Machine Learning is best described as:
A.Hard-coding rules for every possible scenario
B.A subset of AI where computers learn from data without explicit programming
C.Strictly using statistical regression only
D.Data storage optimization
Correct Answer: A subset of AI where computers learn from data without explicit programming
Explanation:ML focuses on algorithms that improve automatically through experience and data usage.
Incorrect! Try again.
8What is the 'Target Variable' in a predictive model?
A.The variable being predicted
B.The variable used to predict
C.The noise in the data
D.The index of the dataset
Correct Answer: The variable being predicted
Explanation:The target variable (or dependent variable) is the outcome the model aims to predict.
Incorrect! Try again.
9Which of the following is a type of Supervised Learning?
A.Clustering
B.Regression
C.Dimensionality Reduction
D.Association Rule Learning
Correct Answer: Regression
Explanation:Regression is a supervised learning task where the output variable is continuous.
Incorrect! Try again.
10In Supervised Learning, the dataset must contain:
A.Only input features
B.Labeled data (Input features and Output labels)
C.Unlabeled data
D.Only images
Correct Answer: Labeled data (Input features and Output labels)
Explanation:Supervised learning requires ground truth (labels) to train the model to map inputs to outputs.
Incorrect! Try again.
11Unsupervised Learning differs from Supervised Learning because it:
A.Uses labeled data
B.Predicts a specific target
C.Finds hidden patterns in unlabeled data
D.Is only used for text data
Correct Answer: Finds hidden patterns in unlabeled data
Explanation:Unsupervised learning deals with data that has no historical labels, focusing on discovering structure or clusters.
Incorrect! Try again.
12Predicting the price of a house based on its square footage is an example of:
A.Classification
B.Regression
C.Clustering
D.Reinforcement Learning
Correct Answer: Regression
Explanation:Since price is a continuous numerical value, this is a regression problem.
Incorrect! Try again.
13Predicting whether an email is 'Spam' or 'Not Spam' is an example of:
A.Regression
B.Clustering
C.Classification
D.Principal Component Analysis
Correct Answer: Classification
Explanation:The output is a categorical label (Spam/Not Spam), making it a classification problem.
Incorrect! Try again.
14Which of the following is an Unsupervised Learning algorithm?
A.Linear Regression
B.K-Means Clustering
C.Logistic Regression
D.Decision Trees
Correct Answer: K-Means Clustering
Explanation:K-Means is used to group data points into clusters based on similarity, without pre-existing labels.
Incorrect! Try again.
15What is the main objective of Clustering?
A.To predict a continuous value
B.To group similar data points together
C.To classify data into known categories
D.To reduce the number of rows in a table
Correct Answer: To group similar data points together
Explanation:Clustering identifies natural groupings in data such that items in the same group are more similar to each other than to those in other groups.
Incorrect! Try again.
16Reinforcement Learning involves an agent that learns by:
A.Mimicking a teacher
B.Interacting with an environment and receiving rewards or penalties
C.Analyzing static clusters
D.Cleaning database records
Correct Answer: Interacting with an environment and receiving rewards or penalties
Explanation:Reinforcement learning is based on a reward-penalty system where an agent learns to maximize cumulative reward.
Incorrect! Try again.
17Which of the following is a common issue where a model performs well on training data but poorly on new data?
A.Underfitting
B.Overfitting
C.Normalization
D.Clustering
Correct Answer: Overfitting
Explanation:Overfitting occurs when a model learns the noise and details of the training data to the extent that it negatively impacts the performance on new data.
Incorrect! Try again.
18What is the first step in Data Preprocessing?
A.Feature Scaling
B.Data Cleaning
C.Model Training
D.Hyperparameter Tuning
Correct Answer: Data Cleaning
Explanation:Before scaling or training, data must be cleaned (handling missing values, noise, and inconsistencies).
Incorrect! Try again.
19Garbage In, Garbage Out (GIGO) implies that:
A.We should delete all data
B.Poor quality input data leads to poor quality model output
C.More data always results in better models
D.Computer hardware needs regular cleaning
Correct Answer: Poor quality input data leads to poor quality model output
Explanation:The quality of the predictive model is directly constrained by the quality of the data used to train it.
Incorrect! Try again.
20Which technique is used to handle missing values in a dataset?
A.Imputation
B.Overfitting
C.Clustering
D.Regression
Correct Answer: Imputation
Explanation:Imputation involves replacing missing data with substituted values (e.g., mean, median, or mode).
Incorrect! Try again.
21What is the purpose of 'Feature Scaling'?
A.To increase the number of features
B.To bring all features to a similar scale or range
C.To remove missing values
D.To convert text to numbers
Correct Answer: To bring all features to a similar scale or range
Explanation:Feature scaling ensures that no single feature dominates the model due to having a larger magnitude (e.g., salary vs age).
Incorrect! Try again.
22Standardization (Z-score normalization) transforms data to have:
A.A mean of 0 and standard deviation of 1
B.A range between 0 and 1
C.A mean of 100
D.No negative numbers
Correct Answer: A mean of 0 and standard deviation of 1
Explanation:Standardization rescales data so that it follows a standard normal distribution.
Incorrect! Try again.
23Which preprocessing technique is used to convert categorical variables into numerical format?
A.Scaling
B.Encoding
C.Imputation
D.Sampling
Correct Answer: Encoding
Explanation:Encoding (like One-Hot Encoding or Label Encoding) converts text categories into numbers so algorithms can process them.
Incorrect! Try again.
24One-Hot Encoding helps in handling:
A.Missing values
B.Nominal categorical data
C.Outliers
D.Continuous variables
Correct Answer: Nominal categorical data
Explanation:One-Hot Encoding creates binary columns for each category, which is ideal for nominal data where no order exists.
Incorrect! Try again.
25An outlier is defined as:
A.A value that is exactly the mean
B.A missing value
C.A data point that differs significantly from other observations
D.A categorical variable
Correct Answer: A data point that differs significantly from other observations
Explanation:Outliers are extreme values that deviate remarkably from the rest of the dataset.
Incorrect! Try again.
26Which method is commonly used to detect outliers?
A.Box Plot
B.Pie Chart
C.Confusion Matrix
D.Gradient Descent
Correct Answer: Box Plot
Explanation:Box plots visually display the distribution of data and identify outliers as points outside the whiskers (interquartile range).
Incorrect! Try again.
27Dimensionality Reduction aims to:
A.Increase the number of variables
B.Reduce the number of input variables while retaining important information
C.Remove all categorical variables
D.Create more rows in the dataset
Correct Answer: Reduce the number of input variables while retaining important information
Explanation:It simplifies models, reduces computation time, and helps avoid the curse of dimensionality.
Incorrect! Try again.
28PCA (Principal Component Analysis) is a technique used for:
A.Supervised Classification
B.Dimensionality Reduction
C.Reinforcement Learning
D.Data Imputation
Correct Answer: Dimensionality Reduction
Explanation:PCA transforms a large set of variables into a smaller one that still contains most of the information.
Incorrect! Try again.
29Why do we split data into Training and Testing sets?
A.To make the dataset smaller
B.To evaluate the model's performance on unseen data
C.To train two different models
D.To remove outliers
Correct Answer: To evaluate the model's performance on unseen data
Explanation:The test set acts as a proxy for new, real-world data to check if the model generalizes well.
Incorrect! Try again.
30Underfitting occurs when:
A.The model is too complex
B.The model captures noise
C.The model is too simple to capture the underlying structure of the data
D.The training data is perfect
Correct Answer: The model is too simple to capture the underlying structure of the data
Explanation:Underfitting happens when a model cannot learn the patterns in the training data, resulting in poor performance on both training and test data.
Incorrect! Try again.
31Which of the following is NOT a data preprocessing step?
A.Feature Selection
B.Data Cleaning
C.Data Transformation
D.Hypothesis Testing
Correct Answer: Hypothesis Testing
Explanation:Hypothesis testing is a statistical method for inference, not typically considered a preprocessing step for preparing data for ML.
Incorrect! Try again.
32Min-Max Scaling transforms data into which range?
A.[-1, 1]
B.[0, 1]
C.[-infinity, +infinity]
D.[0, 100]
Correct Answer: [0, 1]
Explanation:Min-Max scaling rescales the feature to a fixed range, typically 0 to 1.
Incorrect! Try again.
33In a dataset, a row usually represents:
A.A feature
B.An observation or instance
C.A label
D.A statistical summary
Correct Answer: An observation or instance
Explanation:In tabular data, rows represent individual records (instances), and columns represent attributes (features).
Incorrect! Try again.
34Market Basket Analysis is an application of which learning type?
A.Regression
B.Association Rule Learning
C.Classification
D.Supervised Learning
Correct Answer: Association Rule Learning
Explanation:It finds associations between items, such as 'people who buy bread also buy milk' (Unsupervised learning).
Incorrect! Try again.
35Which of the following describes 'Feature Selection'?
A.Creating new features from existing ones
B.Selecting the most relevant features to improve model performance
C.Scaling features
D.Handling missing values
Correct Answer: Selecting the most relevant features to improve model performance
Explanation:Feature selection involves picking a subset of relevant features to reduce complexity and improve accuracy.
Incorrect! Try again.
36Noise in data refers to:
A.Missing values
B.Meaningless or random variance in the data
C.Categorical labels
D.Duplicate rows
Correct Answer: Meaningless or random variance in the data
Explanation:Noise represents unwanted variation or random errors that obscure the true signal in the data.
Incorrect! Try again.
37Which of the following is a quantitative variable?
A.Gender (Male/Female)
B.Zip Code
C.Age
D.Color (Red/Blue)
Correct Answer: Age
Explanation:Age is a numerical value that measures a quantity, whereas the others are categorical.
Incorrect! Try again.
38Label Encoding is best used when:
A.The categorical feature has no order
B.The categorical feature is ordinal (has an inherent order)
C.There are missing values
D.The data is continuous
Correct Answer: The categorical feature is ordinal (has an inherent order)
Explanation:Label encoding assigns an integer to categories. If order matters (e.g., Low, Medium, High), this is appropriate. For nominal data, it may introduce false relationships.
Incorrect! Try again.
39The process of converting raw data into a clean dataset is often called:
A.Data Mining
B.Data Wrangling/Munging
C.Data Visualization
D.Data Architecture
Correct Answer: Data Wrangling/Munging
Explanation:Data wrangling is the process of transforming and mapping data from one 'raw' data form into another format for analytics.
Incorrect! Try again.
40Which of these is a supervised learning algorithm used for classification?
A.K-Means
B.Logistic Regression
C.Apriori
D.PCA
Correct Answer: Logistic Regression
Explanation:Despite its name, Logistic Regression is used for binary classification tasks.
Incorrect! Try again.
41If a dataset has duplicate records, the preprocessing step required is:
A.Imputation
B.Deduplication
C.Normalization
D.Encoding
Correct Answer: Deduplication
Explanation:Deduplication removes identical rows to prevent bias in the model.
Incorrect! Try again.
42The 'Curse of Dimensionality' refers to problems caused by:
A.Too many missing values
B.Too many features (variables) relative to the number of observations
C.Too much processing power
D.Inaccurate labels
Correct Answer: Too many features (variables) relative to the number of observations
Explanation:As the number of features increases, the data becomes sparse, making it difficult for models to find patterns without massive amounts of data.
Incorrect! Try again.
43Which variable type requires dummy variables creation during preprocessing?
A.Numerical
B.Categorical
C.Binary
D.Ordinal
Correct Answer: Categorical
Explanation:Categorical variables (specifically nominal ones) often need to be converted into dummy variables (0/1 columns) to be used in mathematical models.
Incorrect! Try again.
44Semi-supervised learning uses:
A.Only labeled data
B.Only unlabeled data
C.A small amount of labeled data and a large amount of unlabeled data
D.Reinforcement signals
Correct Answer: A small amount of labeled data and a large amount of unlabeled data
Explanation:This approach leverages the large volume of unlabeled data to improve learning accuracy when labeling data is expensive.
Incorrect! Try again.
45What is the result of 'Data Transformation'?
A.Data is deleted
B.Data is converted into a format suitable for modeling
C.Data is visualized
D.Data is collected
Correct Answer: Data is converted into a format suitable for modeling
Explanation:Transformation includes smoothing, aggregation, generalization, normalization, etc., to make data model-ready.
Incorrect! Try again.
46Predicting the temperature for tomorrow is a:
A.Classification task
B.Regression task
C.Clustering task
D.Preprocessing task
Correct Answer: Regression task
Explanation:Temperature is a continuous numerical variable, making this a regression problem.
Incorrect! Try again.
47Customer segmentation usually relies on which type of learning?