1What is the primary characteristic of Supervised Learning?
A.The model learns from unlabeled data to find hidden patterns.
B.The model learns from a labeled dataset containing input-output pairs.
C.The model interacts with an environment and learns via a reward system.
D.The model groups data points based on inherent similarities without predefined categories.
Correct Answer: The model learns from a labeled dataset containing input-output pairs.
Explanation:In supervised learning, the algorithm is trained on a labeled dataset, meaning the data includes both the input features and the corresponding correct output (target).
Incorrect! Try again.
2Which of the following scenarios is a Regression problem?
A.Predicting whether an email is spam or not.
B.Predicting the price of a house based on its square footage.
C.Classifying an image as a cat or a dog.
D.Grouping customers based on purchasing behavior.
Correct Answer: Predicting the price of a house based on its square footage.
Explanation:Regression deals with predicting continuous numerical values (like price), whereas the other options represent classification (categorical output) or clustering.
Incorrect! Try again.
3Which library in Python is the standard for implementing classic machine learning algorithms like Decision Trees and SVMs?
A.Pandas
B.Scikit-learn
C.Matplotlib
D.NumPy
Correct Answer: Scikit-learn
Explanation:Scikit-learn (sklearn) is the most widely used Python library for implementing standard machine learning algorithms, preprocessing, and model evaluation.
Incorrect! Try again.
4In a dataset, Ordinal Data refers to:
A.Categorical data with no intrinsic order (e.g., Red, Blue, Green).
B.Categorical data with a clear ordering or ranking (e.g., Low, Medium, High).
C.Continuous numerical data (e.g., Height, Weight).
D.Binary data (e.g., True/False).
Correct Answer: Categorical data with a clear ordering or ranking (e.g., Low, Medium, High).
Explanation:Ordinal data is a type of categorical data where the values have a meaningful order or rank, but the intervals between the ranks may not be equal.
Incorrect! Try again.
5Which Pandas function is primarily used to load data from a Comma Separated Values file?
A.pd.load_csv()
B.pd.read_excel()
C.pd.read_csv()
D.pd.import_data()
Correct Answer: pd.read_csv()
Explanation:The function pd.read_csv() is the standard Pandas method for loading data from CSV files into a DataFrame.
Incorrect! Try again.
6What is the purpose of the df.describe() method in Pandas?
A.To visualize the correlation matrix.
B.To show the data types and non-null counts of columns.
C.To provide summary statistics (mean, std, min, max) for numerical columns.
D.To drop missing values from the dataframe.
Correct Answer: To provide summary statistics (mean, std, min, max) for numerical columns.
Explanation:df.describe() generates descriptive statistics including those that summarize the central tendency, dispersion, and shape of a dataset’s distribution.
Incorrect! Try again.
7When handling missing data, what is Imputation?
A.Removing the rows containing missing values.
B.Replacing missing values with substituted values (e.g., mean, median, mode).
C.Ignoring the column containing missing values.
D.Converting the missing values to a specific category like "Unknown".
Explanation:Imputation refers to the process of replacing missing data with substituted values to retain the data point for analysis.
Incorrect! Try again.
8Which visualization is most effective for identifying Outliers in a numerical feature?
A.Scatter Plot
B.Box Plot
C.Pie Chart
D.Bar Chart
Correct Answer: Box Plot
Explanation:A Box Plot visually depicts groups of numerical data through their quartiles and explicitly indicates outliers as individual points beyond the 'whiskers'.
Incorrect! Try again.
9In the context of outlier detection, what does the IQR (Interquartile Range) represent?
A.The difference between the maximum and minimum values.
B.The difference between the 75th percentile () and the 25th percentile ().
C.The standard deviation of the dataset.
D.The distance between the mean and the median.
Correct Answer: The difference between the 75th percentile () and the 25th percentile ().
Explanation:. It measures the statistical spread of the middle 50% of the data and is used to define the bounds for outliers.
Incorrect! Try again.
10What is the formula for Min-Max Scaling (Normalization)?
A.
B.
C.
D.
Correct Answer:
Explanation:Min-Max scaling transforms features by scaling each feature to a given range, usually [0, 1], using the minimum and maximum values of that feature.
Incorrect! Try again.
11Which scaling technique transforms data to have a mean of 0 and a standard deviation of 1?
Explanation:Standardization (using StandardScaler in sklearn) centers the distribution around 0 and scales it to unit variance.
Incorrect! Try again.
12Why is One-Hot Encoding preferred over Label Encoding for nominal categorical variables?
A.It requires less memory.
B.It prevents the model from assuming a mathematical order or rank between categories.
C.It handles missing values automatically.
D.It is faster to compute.
Correct Answer: It prevents the model from assuming a mathematical order or rank between categories.
Explanation:Label encoding assigns integers (0, 1, 2...) to categories, which some algorithms might misinterpret as an ordinal relationship (2 > 1). One-Hot encoding avoids this.
Incorrect! Try again.
13What is the Dummy Variable Trap?
A.When categorical variables are not encoded.
B.When independent variables are highly correlated (multicollinearity) due to including all dummy variables.
C.When the target variable is imbalanced.
D.When missing values are replaced by zeros.
Correct Answer: When independent variables are highly correlated (multicollinearity) due to including all dummy variables.
Explanation:The Dummy Variable Trap occurs when one variable can be predicted from the others (e.g., Female = 1 - Male). This multicollinearity can break some models like linear regression. It is solved by dropping one dummy column.
Incorrect! Try again.
14Which technique is commonly used to handle Class Imbalance by generating synthetic samples for the minority class?
Explanation:SMOTE creates synthetic examples for the minority class by interpolating between existing minority samples, rather than just duplicating them.
Incorrect! Try again.
15What is the primary goal of Feature Selection?
A.To create new features from existing ones.
B.To select a subset of relevant features to improve model performance and reduce complexity.
C.To scale features to the same range.
D.To fill missing values in the features.
Correct Answer: To select a subset of relevant features to improve model performance and reduce complexity.
Explanation:Feature selection removes redundant or irrelevant data, which helps reduce overfitting, improves accuracy, and speeds up training.
Incorrect! Try again.
16Which of the following is an example of a Wrapper Method for feature selection?
Explanation:Wrapper methods, like RFE, select features by recursively training the model on subsets of features and evaluating performance.
Incorrect! Try again.
17What is the purpose of train_test_split in machine learning?
A.To separate numerical and categorical columns.
B.To split the dataset into training and validation/test sets to evaluate generalization.
C.To split the dataset into features () and target ().
D.To remove outliers from the data.
Correct Answer: To split the dataset into training and validation/test sets to evaluate generalization.
Explanation:Splitting data ensures the model is evaluated on unseen data, preventing the estimation of performance based on data the model has already memorized.
Incorrect! Try again.
18What is Data Leakage?
A.When data is lost during file transfer.
B.When information from outside the training dataset (like the test set) is used to create the model.
C.When the model leaks sensitive user information.
D.When the variance of the data is too high.
Correct Answer: When information from outside the training dataset (like the test set) is used to create the model.
Explanation:Data leakage occurs when the model unknowingly has access to the target or test distribution during training (e.g., scaling before splitting), leading to overly optimistic performance estimates.
Incorrect! Try again.
19Which plot is best for visualizing the relationship between two continuous variables?
A.Histogram
B.Scatter Plot
C.Bar Chart
D.Box Plot
Correct Answer: Scatter Plot
Explanation:Scatter plots map individual data points on an X-Y plane, making them ideal for observing correlations between two continuous variables.
Incorrect! Try again.
20In the context of Pandas, what does df.isnull().sum() return?
A.The total number of rows in the dataframe.
B.The sum of all values in the dataframe.
C.The count of missing values in each column.
D.The count of unique values in each column.
Correct Answer: The count of missing values in each column.
Explanation:isnull() returns a boolean mask, and sum() counts the True values (which represent missing data) per column.
Incorrect! Try again.
21When performing a train-test split on an imbalanced dataset, which parameter ensures the class distribution is preserved in both sets?
A.shuffle=True
B.random_state=42
C.stratify=y
D.test_size=0.2
Correct Answer: stratify=y
Explanation:The stratify parameter ensures that the proportion of values in the sample produced will be the same as the proportion of values provided in the target array y.
Incorrect! Try again.
22Which of the following is a technique for Dimensionality Reduction?
A.Linear Regression
B.Principal Component Analysis (PCA)
C.K-Nearest Neighbors
D.Logistic Regression
Correct Answer: Principal Component Analysis (PCA)
Explanation:PCA is a technique used to reduce the dimensionality of datasets, increasing interpretability but minimizing information loss by creating new uncorrelated variables.
Incorrect! Try again.
23The Curse of Dimensionality refers to:
A.The difficulty of visualizing 3D data.
B.Issues that arise when analyzing data in high-dimensional spaces (sparse data, increased computation).
C.The inability to add more features to a model.
D.The error caused by using incorrect units of measurement.
Correct Answer: Issues that arise when analyzing data in high-dimensional spaces (sparse data, increased computation).
Explanation:As the number of features increases, the volume of the space increases exponentially, making the data sparse and distance metrics less meaningful.
Incorrect! Try again.
24What is Feature Engineering?
A.Selecting the best hardware for training.
B.The process of using domain knowledge to extract or create new features from raw data.
C.Removing all categorical variables.
D.Downloading datasets from the internet.
Correct Answer: The process of using domain knowledge to extract or create new features from raw data.
Explanation:Feature engineering involves creating new input features from the existing ones (e.g., extracting 'Year' from a 'Date' column) to improve model performance.
Incorrect! Try again.
25Which Scikit-learn module contains StandardScaler and MinMaxScaler?
A.sklearn.linear_model
B.sklearn.preprocessing
C.sklearn.metrics
D.sklearn.ensemble
Correct Answer: sklearn.preprocessing
Explanation:The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation suitable for downstream estimators.
Incorrect! Try again.
26If a feature has a Variance of 0, what does it imply?
A.The feature has a high correlation with the target.
B.The feature contains only one unique value for all samples.
C.The feature is normally distributed.
D.The feature has missing values.
Correct Answer: The feature contains only one unique value for all samples.
Explanation:Variance measures spread. If variance is 0, the values do not spread at all, meaning all values are identical. Such features carry no information and should be removed.
Incorrect! Try again.
27Which of the following is considered Unstructured Data?
A.A SQL database table.
B.An Excel spreadsheet.
C.Images and Audio files.
D.A CSV file with labeled columns.
Correct Answer: Images and Audio files.
Explanation:Unstructured data does not have a predefined data model or is not organized in a pre-defined manner (like tables), examples include text, images, and audio.
Incorrect! Try again.
28What does a correlation coefficient of -0.9 indicate between two features?
A.No relationship.
B.A strong positive linear relationship.
C.A strong negative linear relationship.
D.A weak negative linear relationship.
Correct Answer: A strong negative linear relationship.
Explanation:Correlation coefficients range from -1 to 1. A value close to -1 indicates that as one variable increases, the other decreases strongly.
Incorrect! Try again.
29When using LabelEncoder, how is the data transformed?
A.It converts text labels into binary columns.
B.It converts text labels into integers (0, 1, 2, ...).
C.It scales the data between 0 and 1.
D.It removes the column.
Correct Answer: It converts text labels into integers (0, 1, 2, ...).
Explanation:LabelEncoder replaces unique categories with integer codes.
Incorrect! Try again.
30Which algorithm is generally NOT sensitive to the scale of features?
A.K-Nearest Neighbors (KNN)
B.Support Vector Machines (SVM)
C.Decision Trees
D.K-Means Clustering
Correct Answer: Decision Trees
Explanation:Decision Trees and Random Forests split nodes based on thresholds of single features, so the absolute scale of the feature does not affect the structure of the tree. Distance-based models (KNN, SVM, K-Means) are highly sensitive.
Incorrect! Try again.
31In Scikit-Learn, what is the role of the fit() method?
A.To make predictions on new data.
B.To calculate the accuracy of the model.
C.To learn parameters (e.g., mean, coefficients) from the training data.
D.To split the data.
Correct Answer: To learn parameters (e.g., mean, coefficients) from the training data.
Explanation:fit() triggers the training process where the algorithm learns the internal parameters from the provided data.
Incorrect! Try again.
32What is the difference between fit_transform() and transform()?
A.fit_transform is used on the training set to learn parameters and apply them; transform is used on the test set using learned parameters.
B.fit_transform is used on the test set; transform is used on the training set.
C.They are identical and can be used interchangeably.
D.transform is only used for image data.
Correct Answer: fit_transform is used on the training set to learn parameters and apply them; transform is used on the test set using learned parameters.
Explanation:We use fit_transform on training data to calculate means/SDs and scale the data. We use transform on test data to scale it using the training means/SDs to prevent data leakage.
Incorrect! Try again.
33Which Seaborn plot is used to visualize the Distribution of a single numerical variable?
A.sns.heatmap()
B.sns.scatterplot()
C.sns.histplot() (or distplot)
D.sns.countplot()
Correct Answer: sns.histplot() (or distplot)
Explanation:Histograms (histplot or the deprecated distplot) are designed to show the frequency distribution of a single continuous variable.
Incorrect! Try again.
34How do you handle Duplicate Rows in Pandas?
A.df.drop_duplicates()
B.df.remove_copies()
C.df.delete_repeats()
D.df.unique()
Correct Answer: df.drop_duplicates()
Explanation:drop_duplicates() is the Pandas method to remove duplicate rows from a DataFrame.
Incorrect! Try again.
35In PCA, what represents the direction of maximum variance in the data?
A.The Eigenvalues
B.The Principal Components (Eigenvectors)
C.The Mean vector
D.The Covariance matrix
Correct Answer: The Principal Components (Eigenvectors)
Explanation:The first Principal Component is the eigenvector associated with the largest eigenvalue, representing the direction of maximum variance.
Incorrect! Try again.
36What is Target Encoding (or Mean Encoding)?
A.Encoding categorical variables based on the mean of the target variable for that category.
B.Encoding the target variable into a One-Hot vector.
C.Replacing the target with the mean of the features.
D.Assigning random numbers to the target.
Correct Answer: Encoding categorical variables based on the mean of the target variable for that category.
Explanation:Target encoding replaces a categorical feature value with the average value of the target variable for that specific category. It is powerful but risks overfitting.
Incorrect! Try again.
37Which of the following indicates a skewed distribution?
A.Mean = Median = Mode
B.The distribution is symmetrical.
C.The tail of the distribution is longer on one side than the other.
D.The standard deviation is 0.
Correct Answer: The tail of the distribution is longer on one side than the other.
Explanation:Skewness refers to asymmetry in the distribution. A long tail to the right is positive skew; a long tail to the left is negative skew.
Incorrect! Try again.
38What is the result of executing df.info()?
A.A summary of statistical metrics.
B.A concise summary of the DataFrame including index dtype, columns, non-null values, and memory usage.
C.The first 5 rows of the DataFrame.
D.A correlation heatmap.
Correct Answer: A concise summary of the DataFrame including index dtype, columns, non-null values, and memory usage.
Explanation:df.info() is essential for initial exploration to check data types and identify missing values structurally.
Incorrect! Try again.
39Before feeding text data into a supervised learning model, it must be converted into numerical vectors. This process is called:
A.Vectorization (e.g., TF-IDF, Bag of Words)
B.Normalization
C.Imputation
D.Classification
Correct Answer: Vectorization (e.g., TF-IDF, Bag of Words)
Explanation:Machine learning models require numerical input. Text vectorization transforms text strings into numerical vectors.
Incorrect! Try again.
40Which method helps in identifying Multicollinearity among features?
A.Confusion Matrix
B.ROC Curve
C.Heatmap of the Correlation Matrix
D.Scatter plot of Feature vs Target
Correct Answer: Heatmap of the Correlation Matrix
Explanation:A correlation heatmap visualizes the correlation coefficients between all pairs of features. High values between two independent features indicate multicollinearity.
Incorrect! Try again.
41If a dataset has missing values that are MCAR (Missing Completely At Random), which handling method is generally safe if the dataset is large?
A.Dropping the rows with missing values.
B.Replacing with a constant like -1.
C.Using a complex prediction model.
D.Leaving them as NaN.
Correct Answer: Dropping the rows with missing values.
Explanation:If data is MCAR, the missingness implies no hidden bias. If the dataset is large enough, dropping these rows does not introduce bias, though it reduces sample size.
Incorrect! Try again.
42What is the advantage of using a Pipeline in Scikit-Learn?
A.It allows for parallel processing on GPUs.
B.It chains together multiple processing steps (scaling, encoding, modeling) into a single object, preventing data leakage.
C.It automatically selects the best algorithm.
D.It creates a graphical user interface.
Correct Answer: It chains together multiple processing steps (scaling, encoding, modeling) into a single object, preventing data leakage.
Explanation:Pipelines ensure that preprocessing steps (like scaling) are applied correctly during cross-validation (fitting only on train folds), preventing leakage.
Incorrect! Try again.
43Which feature selection method uses a model's coef_ or feature_importances_ attribute to select features?
A.Filter Method
B.Embedded Method
C.Wrapper Method
D.Unsupervised Method
Correct Answer: Embedded Method
Explanation:Embedded methods (like Lasso or Random Forest) perform feature selection during the model training process, assigning weights or importance scores to features.
Incorrect! Try again.
44What is the shape of the output of df.shape in Pandas?
A.(Number of Columns, Number of Rows)
B.(Number of Rows, Number of Columns)
C.(Total Elements,)
D.(Number of Unique Values,)
Correct Answer: (Number of Rows, Number of Columns)
Explanation:df.shape returns a tuple representing the dimensionality of the DataFrame in the format (rows, columns).
Incorrect! Try again.
45Which of the following is a Classification algorithm?
A.Linear Regression
B.Logistic Regression
C.Polynomial Regression
D.Ridge Regression
Correct Answer: Logistic Regression
Explanation:Despite the name, Logistic Regression is a classification algorithm used to predict binary outcomes (probabilities).
Incorrect! Try again.
46When detecting outliers using the Z-score method, a common threshold to identify an outlier is a Z-score absolute value greater than:
A.1
B.1.5
C.3
D.10
Correct Answer: 3
Explanation:In a normal distribution, 99.7% of data points lie within 3 standard deviations. Points beyond are typically considered outliers.
Incorrect! Try again.
47What is the correct syntax to drop a column named 'ID' from a Pandas DataFrame df?
A.df.drop('ID', axis=0)
B.df.drop('ID', axis=1)
C.df.remove('ID')
D.df.delete('ID')
Correct Answer: df.drop('ID', axis=1)
Explanation:axis=1 refers to columns. axis=0 refers to rows.
Incorrect! Try again.
48Why is Data Exploration (EDA) a critical first step?
A.It is required by the Python interpreter.
B.To understand data structure, detect anomalies, test assumptions, and determine preprocessing needs.
C.It automatically trains the model.
D.It increases the size of the dataset.
Correct Answer: To understand data structure, detect anomalies, test assumptions, and determine preprocessing needs.
Explanation:EDA allows the data scientist to understand the nature of the data, relationships between variables, and quality issues before attempting to model.
Incorrect! Try again.
49Which encoding technique creates a binary column for every category level?
A.Label Encoding
B.Target Encoding
C.One-Hot Encoding
D.Ordinal Encoding
Correct Answer: One-Hot Encoding
Explanation:One-Hot Encoding expands a categorical column into multiple binary columns (0 or 1), one for each unique category.
Incorrect! Try again.
50What is the main drawback of PCA?
A.It increases the dimensionality of the data.
B.It is computationally very expensive for small datasets.
C.The resulting Principal Components are often difficult to interpret in terms of original features.
D.It only works on categorical data.
Correct Answer: The resulting Principal Components are often difficult to interpret in terms of original features.
Explanation:PCA transforms original features into linear combinations (Principal Components). While this reduces dimensions, the physical meaning of the original features is lost in the transformation.
Incorrect! Try again.
Give Feedback
Help us improve by sharing your thoughts or reporting issues.