1 $What is the primary goal of Machine Learning?$

Introduction to Machine Learning Easy

A.

To create visually appealing user interfaces

B.

To enable computers to learn from data without being explicitly programmed

C.

To write complex algorithms that can solve any problem

D.

To store large amounts of data efficiently

2 $Which type of Machine Learning involves training a model on a dataset with labeled outputs?$

Types of Machine Learning Easy

A.

Supervised Learning

B.

Unsupervised Learning

C.

Reinforcement Learning

D.

Semi-supervised Learning

3 $Grouping customers into different clusters based on their purchasing behavior is an example of what type of machine learning?$

Types of Machine Learning Easy

A.

Classification

B.

Reinforcement Learning

C.

Regression

D.

Clustering (Unsupervised Learning)

4 $Which of the following best describes the relationship between Artificial Intelligence (AI) and Machine Learning (ML)?$

Difference between ML, AI and Data Science Easy

A.

AI is a subset of ML

B.

AI and ML are the same thing

C.

ML is a subset of AI

D.

AI and ML are completely separate fields

5 $A key characteristic of a parametric model is that it:$

Parametric vs Non-parametric models Easy

A.

Has a number of parameters that grows with the data

B.

Does not make any assumptions about the data

C.

Is always less accurate than a non-parametric model

D.

Has a fixed number of parameters and makes strong assumptions about the data's form

6 $Which type of model learns the decision boundary between different classes directly?$

Generative vs Discriminative Models Easy

A.

Regression Model

B.

Generative Model

C.

Clustering Model

D.

Discriminative Model

7 $What is typically the first step in a standard Machine Learning workflow?$

Machine Learning Workflow Easy

A.

Data Collection and Preparation

B.

Performance Evaluation

C.

Model Deployment

D.

Model Training

8 $What is the primary purpose of a test set in machine learning?$

Train-Test data split Easy

A.

To provide an unbiased evaluation of the final model's performance on unseen data

B.

To validate and tune hyperparameters

C.

To reduce the size of the training dataset

D.

To train the model more effectively

9 $For a classification problem, what does the Accuracy metric measure?$

Overview of Evaluation Metrics Easy

A.

The average squared difference between the predicted and actual values

B.

The ratio of true positives to the sum of true positives and false positives

C.

The model's confidence in its predictions

D.

The proportion of correct predictions out of the total number of predictions

10 $When a machine learning model performs very well on the training data but poorly on the test data, what is this phenomenon called?$

Overfitting and Underfitting Easy

A.

Underfitting

B.

A good fit

C.

Overfitting

D.

Data leakage

11 $A model that is too simple and fails to capture the underlying patterns in the data, resulting in high error on both training and test sets, is said to be...$

Overfitting and Underfitting Easy

A.

Overfitting

B.

Underfitting

C.

Robust

D.

Generalizing

12 $In the context of the Bias-Variance Trade-off, a model with high bias and low variance typically...$

Bias-Variance Trade-off Easy

A.

Is a very complex model

B.

Perfectly fits the data

C.

Overfits the data

D.

Underfits the data

13 $Which of the following is a common application of Machine Learning?$

Applications and Use-cases of ML Easy

A.

Designing a computer processor

B.

Writing an operating system kernel

C.

Creating a basic text editor

D.

Spam email filtering

14 $What is the primary data structure used in the NumPy library for numerical computing?$

Numpy for Numerical Computing Easy

A.

List

B.

DataFrame

C.

Dictionary

D.

ndarray (N-dimensional array)

15 $Which NumPy function is used to create an array of a specified shape, filled with the number 1?$

Creating Arrays Easy

A.

numpy.empty()

B.

numpy.zeros()

C.

numpy.arange()

D.

numpy.ones()

16 $What will the following NumPy code produce? import numpy as np; arr = np.arange(5)$

Creating Arrays Easy

A.

An array [0, 1, 2, 3, 4, 5]

B.

An array [0, 1, 2, 3, 4]

C.

An array [1, 2, 3, 4, 5]

D.

A single number 5

17 $Given two NumPy arrays, a = np.array([1, 2, 3]) and b = np.array([4, 5, 6]), what is the result of a + b ?$

Operations on Arrays Easy

A.

An array [5, 7, 9]

B.

An error because you cannot add arrays

C.

An array [1, 2, 3, 4, 5, 6]

D.

An array [[1, 4], [2, 5], [3, 6]]

18 $If arr = np.array([10, 20, 30]), what is the result of arr / 10 ?$

Operations on Arrays Easy

A.

A single number 2

B.

An array [1, 2, 3]

C.

An error, division is not supported

D.

An array [10, 20, 30, 10]

19 $What is NumPy's 'Broadcasting'?$

Broadcasting Rule Easy

A.

A way to convert arrays to different data types.

B.

A method for sending arrays over a network.

C.

A function that makes arrays larger by repeating elements.

D.

The ability to perform arithmetic operations on arrays of different but compatible shapes.

20 $Which function from NumPy's random module would you use to generate an array of random integers within a specific range?$

Random Module Easy

A.

np.random.rand()

B.

np.random.choice()

C.

np.random.randn()

D.

np.random.randint()

21 $A machine learning model demonstrates high variance and low bias. Which of the following strategies is most likely to improve the model's performance on a new, unseen test set?$

Bias-Variance Trade-off Medium

A.

Use a smaller set of features for training.

B.

Decrease the amount of regularization applied to the model.

C.

Increase the complexity of the model (e.g., add more polynomial features).

D.

Gather more training data for the model.

22 $A data scientist trains a decision tree and observes that the training accuracy is 99%, while the validation accuracy is only 72%. This is a clear case of overfitting. Which of the following is the most appropriate next step to mitigate this issue?$

Overfitting and Underfitting Medium

A.

Use a larger portion of the data for training and a smaller portion for validation.

B.

Prune the tree by setting a reasonable max_depth or increasing min_samples_leaf .

C.

Decrease the min_samples_split parameter to allow splits on smaller nodes.

D.

Increase the max_depth parameter of the tree to allow it to learn more complex patterns.

23 $You are working with a very large dataset where you suspect the underlying data distribution is highly complex and non-linear. Which type of model is generally more suitable in this scenario and why?$

Parametric vs Non-parametric models Medium

A.

A non-parametric model, because it is more flexible and can capture complex patterns without being constrained by a predefined functional form.

B.

A parametric model, because it has fewer parameters and is less likely to overfit a large dataset.

C.

A non-parametric model, because it is always faster to train than a parametric model.

D.

A parametric model, because it makes strong assumptions which simplifies the problem.

24 $A team is building a system to generate realistic-looking images of human faces that do not exist in the real world. Which type of model is fundamentally designed for this task?$

Generative vs Discriminative Models Medium

A.

A generative model, because it learns the underlying probability distribution of the data,, which allows it to sample new data points.

B.

A discriminative model, because it directly models the conditional probability .

C.

A discriminative model, because it learns the boundary between different classes of images.

D.

A generative model, because it is always computationally less expensive than a discriminative model.

25 $An e-commerce company wants to build a system that groups its customers into distinct segments based on their purchasing behavior (e.g., high-spenders, occasional shoppers, brand-loyal). The company does not have pre-defined labels for these segments. This problem falls under which category of machine learning?$

Types of Machine Learning Medium

A.

Unsupervised Learning (Clustering)

B.

Reinforcement Learning

C.

Supervised Learning (Classification)

D.

Supervised Learning (Regression)

26 $An AI agent is being trained to play a complex video game. The agent interacts with the game environment, takes actions (like moving left or jumping), and receives a score (reward) based on its performance. The agent's goal is to learn a policy that maximizes its total score over time. This learning paradigm is best described as:$

Types of Machine Learning Medium

A.

Unsupervised Learning

B.

Semi-supervised Learning

C.

Reinforcement Learning

D.

Supervised Learning

27 $Which statement best articulates the relationship between Artificial Intelligence (AI), Machine Learning (ML), and Data Science?$

Difference between ML, AI and Data Science Medium

A.

Data Science is a subset of ML, which in turn is a subset of AI.

B.

ML is a core subfield of AI that gives computers the ability to learn without being explicitly programmed. Both are tools and concepts used within the broader, interdisciplinary field of Data Science.

C.

AI is a specific implementation of ML, which is a tool used in Data Science.

D.

ML and Data Science are synonymous terms for the statistical branch of AI.

28 $In a standard machine learning workflow, why is the 'Feature Engineering' step considered critically important?$

Machine Learning Workflow Medium

A.

It is the only step where the model's hyperparameters are tuned.

B.

It directly influences the model's ability to learn by transforming raw data into a format that better represents the underlying problem, often improving performance more than model selection itself.

C.

It is the final step before deploying the model to production.

D.

It is primarily concerned with splitting the data into training and testing sets.

29 $Why is it a critical mistake to perform feature scaling (like Min-Max scaling or Standardization) on the entire dataset before splitting it into training and testing sets?$

Train-Test data split Medium

A.

It prevents the use of certain models like Decision Trees, which are not sensitive to feature scaling.

B.

It makes the training process computationally much slower.

C.

It causes data leakage, where information from the test set (e.g., its minimum and maximum values) is used to transform the training set, leading to an overly optimistic and unrealistic performance evaluation.

D.

It is not a mistake; scaling before splitting is a standard and recommended practice for convenience.

30 $In a binary classification problem to detect fraudulent transactions, the cost of a False Negative (failing to detect a fraudulent transaction) is extremely high, while the cost of a False Positive (flagging a legitimate transaction as fraud) is relatively low. Which metric is most important to maximize in this scenario?$

Overview of Evaluation Metrics Medium

A.

Precision

B.

Recall (Sensitivity)

C.

Accuracy

D.

Specificity

31 $A real estate company uses a linear regression model to predict house prices. The model's performance is evaluated using Mean Squared Error (MSE), which yields a value of 900. What is the correct interpretation of the Root Mean Squared Error (RMSE) for this model?$

Overview of Evaluation Metrics Medium

A.

The model is, on average, off by $900 in its predictions.

B.

The model's accuracy is 90.0%.

C.

The model explains 900% of the variance in house prices.

D.

The RMSE is 30, meaning the standard deviation of the prediction errors is $30 (in units of thousands, if prices are so).

32 $A streaming service wants to build a feature that suggests movies to users based on their viewing history and the ratings they have given. Which machine learning technique is most directly applicable to this problem?$

Applications and Use-cases of ML Medium

A.

Linear Regression

B.

Anomaly Detection

C.

Clustering

D.

Collaborative Filtering / Recommender Systems

33 $Consider two NumPy arrays, A with shape (6, 1, 5) and B with shape (7, 1) . If we compute C = A + B, what will be the shape of the resulting array C according to NumPy's broadcasting rules?$

Broadcasting Rule Medium

A.

(6, 7, 1)

B.

(6, 7, 5)

C.

The operation will fail with a ValueError .

D.

(1, 7, 5)

34 $Given a NumPy array X with shape (100, 10) representing 100 data samples with 10 features each. Which operation correctly computes the dot product of the transpose of X with X, and what does the resulting matrix represent?$

Matrix operations Medium

A.

Z = X * X.T, which results in the covariance matrix.

B.

Z = np.matmul(X.T, X), which results in a (10, 10) matrix often related to the feature covariance matrix.

C.

Z = np.matmul(X, X.T), which results in a (10, 10) matrix representing sample similarity.

D.

Z = X + X.T, which fails because the shapes are incompatible.

35 $What is the key difference between the operators * and @ when used with two 2D NumPy arrays A and B of shape (3, 3) ?$

Matrix operations Medium

A.

The * operator is used for matrix inversion, while the @ operator is for matrix multiplication.

B.

The * operator performs element-wise multiplication (Hadamard product), while the @ operator performs standard matrix multiplication.

C.

The @ operator performs element-wise multiplication, while the * operator performs standard matrix multiplication.

D.

They are aliases for the same operation: matrix multiplication.

36 $You need to create a NumPy array that starts at 10, ends at 50, and has a step of 5 between consecutive elements. Which of the following function calls will NOT produce the desired array [10, 15, 20, 25, 30, 35, 40, 45] ?$

Creating Arrays Medium

A.

np.arange(10, 50, 5)

B.

np.arange(10, 51, 5)

C.

np.linspace(10, 45, 8)

D.

np.arange(10, 46, 5)

37 $Given the NumPy array x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), what is the result of the expression x[1:, :2] ?$

Operations on Arrays Medium

A.

array([[4, 5], [7, 8]])

B.

array([[2, 3], [5, 6]])

C.

array([[1, 2], [4, 5], [7, 8]])

D.

array([4, 5])

38 $You have a NumPy array data with shape (500, 20) . You want to find the maximum value within each of the 20 columns. Which command will achieve this and what will be the shape of the result?$

Operations on Arrays Medium

A.

np.max(data, axis=1), with shape (500,)

B.

np.max(data), which returns a single scalar value

C.

data.argmax(axis=1), with shape (500,)

D.

np.max(data, axis=0), with shape (20,)

39 $What is the primary difference in the distribution of numbers generated by numpy.random.rand(1000) versus numpy.random.randn(1000) ?$

Random Module Medium

A.

rand generates integers, while randn generates floats.

B.

rand generates numbers from a uniform distribution over [0, 1), while randn generates numbers from a standard normal distribution (mean=0, std=1).

C.

There is no difference; they are aliases for the same random number generator.

D.

rand generates numbers from a standard normal distribution (mean=0, std=1), while randn generates from a uniform distribution between 0 and 1.

40 $You are running a machine learning experiment that involves random initialization of weights. To ensure that your results are reproducible, which function should you call before any random number generation, and why?$

Random Module Medium

A.

np.random.seed(some_integer), to initialize the random number generator to a fixed state.

B.

np.random.randn(), to warm up the random number generator.

C.

np.random.shuffle(), to ensure the data is properly randomized.

D.

np.random.clear(), to reset the random state to be truly random.

41 $A machine learning model exhibits high variance but low bias. Which of the following strategies is least likely to improve the model's performance on unseen data?$

Bias-Variance Trade-off Hard

A.

Using a simpler model architecture with fewer parameters.

B.

Implementing L2 regularization by increasing the lambda () parameter.

C.

Adding more diverse training examples.

D.

Performing feature engineering to add more complex, polynomial features.

42 $Consider a classification task where you have a very small, high-dimensional training dataset (e.g., 100 samples, 10,000 features). You need to build a classifier. Which model type is likely to perform better and why?$

Generative vs Discriminative Models Hard

A.

A discriminative model like Logistic Regression, because it directly models the decision boundary without wasting resources on the data distribution.

B.

A generative model like Naive Bayes, because it makes strong assumptions about data distribution, thus having fewer parameters to estimate.

C.

A generative model like a Gaussian Mixture Model, because it can perfectly model the underlying clusters in the data.

D.

A discriminative model like a Support Vector Machine with a complex kernel, because it can find a non-linear boundary in the high-dimensional space.

43 $You are choosing between a Linear Regression model and a k-Nearest Neighbors (k-NN) regression model. As the size of the training dataset approaches infinity, what is the fundamental difference in how the 'complexity' of the fitted model is determined?$

Parametric vs Non-parametric models Hard

A.

Linear Regression's complexity is determined by the learning rate, while k-NN's complexity is determined by the distance metric used.

B.

Linear Regression's complexity is fixed by the number of features, while k-NN's effective complexity grows with as it stores all data points.

C.

Both models have complexity that is independent of; it is only determined by the number of features.

D.

Linear Regression's complexity grows with because the coefficients become more precise, while k-NN's complexity is fixed by the choice of .

44 $Given two NumPy arrays A = np.ones((4, 1, 3)) and B = np.ones((5, 1, 1)) . What is the shape of the resulting array from the operation C = A + B ?$

Numpy Broadcasting Rule Hard

A.

The operation will fail with a ValueError .

B.

(5, 4, 3)

C.

(1, 5, 3)

D.

(4, 5, 3)

45 $You plot a learning curve for your model which shows the training error is very low and has plateaued, while the validation error is significantly higher and has also plateaued. There is a large, persistent gap between the two curves. What is the most effective first step to address this issue?$

Overfitting and Underfitting Hard

A.

Decrease the number of features through feature selection.

B.

Use a more complex model (e.g., add more layers to a neural network).

C.

Increase the regularization strength.

D.

Gather more training data.

46 $In a binary classification problem for detecting a rare disease (1% prevalence), model A has an AUROC of 0.85 and an AUPRC of 0.60. Model B has an AUROC of 0.82 and an AUPRC of 0.70. Which model is likely more useful in a practical clinical setting and why?$

Evaluation Metrics Hard

A.

Model B, because a higher AUPRC indicates it makes fewer false positive predictions.

B.

Model A, because AUROC (Area Under Receiver Operating Characteristic Curve) is a more comprehensive measure of a classifier's performance across all thresholds.

C.

Model A, because it has a higher AUROC, indicating better overall separability of the classes.

D.

Model B, because AUPRC (Area Under Precision-Recall Curve) is more informative than AUROC for highly imbalanced datasets.

47 $Given NumPy arrays W with shape (10, 5), X with shape (100, 10), and b with shape (5,) . Which of the following expressions correctly computes the product and results in an array of shape (100, 5) ?$

Numpy Matrix operations Hard

A.

W.T @ X + b

B.

X @ W.T + b

C.

X.T @ W + b

D.

np.dot(W, X) + b

48 $You are building a model to predict customer churn. The dataset contains multiple transactions for each customer. Why would a simple random train_test_split on the transaction level be a critical mistake?$

Train-Test data split Hard

A.

It will not work because train_test_split requires a 1D array of labels, not grouped data.

B.

It leads to data leakage, as transactions from the same customer could end up in both the training and testing sets, violating the independence assumption.

C.

It would create an imbalanced dataset, making the model biased towards customers with more transactions.

D.

It is computationally inefficient compared to splitting at the customer level.

49 $A team is tasked with creating a system that can understand natural language queries, reason about the user's intent, and automatically compose a complex SQL query to retrieve data from a database. The core of this system, which involves understanding and reasoning, falls most squarely under the definition of:$

Difference between ML, AI and Data Science Hard

A.

Machine Learning (ML)

B.

Data Science

C.

Artificial Intelligence (AI)

D.

Statistical Modeling

50 $When implementing a machine learning pipeline that includes feature scaling (e.g., Standardization) and k-fold cross-validation, what is the correct procedure to prevent data leakage?$

Machine Learning Workflow Hard

A.

The scaling transformer should be fit separately on the training fold and the validation fold to best represent the statistics of each fold.

B.

The scaling transformer should be fit on the entire dataset (training + test) before any splitting to ensure all data is on the same scale.

C.

The scaling transformer should be fit on the entire training dataset (before splitting into folds) to get a stable estimate of mean and standard deviation.

D.

The scaling transformer should be fit on the training fold's data only and then used to transform both the training fold and the validation fold.

51 $Consider the following NumPy code snippet: python import numpy as np np.random.seed(42) arr1 = np.random.randint(0, 10, 3) np.random.seed(42) arr2 = np.random.randint(0, 10, 3) arr3 = np.random.randint(0, 10, 3) Which of the following statements about the arrays arr1, arr2, and arr3 is true?$

Numpy Random Module Hard

A.

All three arrays are different from each other.

B.

arr2 and arr3 are identical, but arr1 is different.

C.

All three arrays (arr1, arr2, arr3) are identical.

D.

arr1 and arr2 are identical, but arr3 is different.

52 $Logistic Regression (LR) models the posterior probability directly, while Gaussian Naive Bayes (GNB) models the class conditional probability and the prior . Which statement correctly identifies a key theoretical difference in their resulting decision boundaries?$

Generative vs Discriminative Models Hard

A.

LR produces a probabilistic boundary, while GNB produces a deterministic boundary based on Bayes' rule.

B.

Both models are guaranteed to produce linear decision boundaries in the original feature space.

C.

LR always produces a linear decision boundary, whereas GNB can produce a quadratic decision boundary if the class conditional covariances are different.

D.

GNB always produces a linear decision boundary, whereas LR can produce a non-linear decision boundary using polynomial features.

53 $In Ridge Regression, the cost function is . How does increasing the regularization parameter from zero affect the bias and variance of the model?$

Bias-Variance Trade-off Hard

A.

Both bias and variance increase.

B.

Bias decreases and variance increases.

C.

Bias increases and variance decreases.

D.

Both bias and variance decrease.

54 $You have a 3D array of RGB pixel values pixels with shape (1920, 1080, 3) and a 1D array weights with shape (3,) representing weights for the R, G, and B channels. You want to compute a weighted sum for each pixel to convert the image to grayscale, resulting in a 2D array of shape (1920, 1080) . Which of the following is the most efficient and correct NumPy expression?$

Numpy Broadcasting Rule Hard

A.

np.dot(pixels, weights)

B.

np.sum(pixels * weights, axis=2)

C.

A loop: result = np.zeros((1920, 1080)); for i in range(3): result += pixels[:,:,i] * weights[i]

D.

pixels @ weights

55 $Why is the F1-score considered a more balanced measure for imbalanced classification tasks compared to accuracy?$

Overview of Evaluation Metrics Hard

A.

F1-score is an arithmetic mean of precision and recall, which gives equal importance to both false positives and false negatives.

B.

F1-score is the harmonic mean of precision and recall, which is more sensitive to low values in either metric than the arithmetic mean, thus penalizing models that excel at one at the expense of the other.

C.

F1-score incorporates the True Negative rate, which is ignored by accuracy, making it more robust to class imbalance.

D.

F1-score is calculated from the area under the ROC curve, which is inherently balanced across all classification thresholds.

56 $You have trained two logistic regression models on a high-dimensional dataset that is prone to overfitting. Model A uses L1 regularization (Lasso) and Model B uses L2 regularization (Ridge). After tuning, both achieve similar cross-validation scores. What is the most likely difference in the learned coefficient vectors of the two models?$

Overfitting and Underfitting Hard

A.

Model B's coefficient vector will likely be sparse, while Model A's will have small, non-zero values.

B.

Both models will have sparse coefficient vectors, but Model A's non-zero coefficients will be larger.

C.

Both models will have dense coefficient vectors with small values, but Model B's coefficients will be smaller on average.

D.

Model A's coefficient vector will likely be sparse (many zeros), while Model B's will have many small, non-zero values.

57 $What is the output of the following NumPy code snippet? python import numpy as np arr = np.arange(12).reshape(4, 3) idx = np.array([[0, 2], [1, 1]]) result = arr[np.arange(2)[:, np.newaxis], idx] print(result.shape)$

Creating Arrays Hard

A.

(2, 2)

B.

(2, 1, 2)

C.

The code will raise an IndexError .

D.

(2,)

58 $A financial company wants to develop a system that groups its clients into distinct segments based on their transaction history, income level, and investment portfolio without any pre-existing labels for these segments. Once segmented, the company's analysts will examine these groups to create targeted marketing strategies. This task is a clear example of:$

Types of Machine Learning Hard

A.

Semi-supervised Learning, as analysts will label the data later.

B.

Supervised Learning, specifically multi-class classification.

C.

Reinforcement Learning, where each segment is a state.

D.

Unsupervised Learning, specifically clustering.

59 $Anomaly detection systems, such as those used for identifying fraudulent credit card transactions, often face the 'concept drift' problem. What is concept drift in this context, and why does it necessitate a specific approach to model maintenance?$

Applications and Use-cases of ML Hard

A.

Concept drift describes the natural degradation of the software and hardware infrastructure hosting the model.

B.

Concept drift refers to the increasing computational cost of re-training the model as more data is collected.

C.

Concept drift is the tendency for models to become biased towards the majority class (non-fraudulent transactions) over time.

D.

Concept drift is when the statistical properties of the target variable (fraud) change over time, requiring the model to be continuously retrained or updated on new data.

60 $What is the primary risk of performing hyperparameter tuning (e.g., using GridSearch CV) on the entire dataset before creating a final train-test split?$

Machine Learning Workflow Hard

A.

It will always lead to a model that underfits the data because the hyperparameters are not optimized for the specific training data.

B.

It is computationally infeasible as hyperparameter tuning requires a separate validation set which is not available.

C.

It causes data leakage from the eventual test set into the hyperparameter selection process, leading to an overly optimistic evaluation of the final model's performance.

D.

It violates the assumption of IID (Independently and Identically Distributed) data required by most machine learning algorithms.

Unit 1 - Practice Quiz