Unit 1 - Practice Quiz

INT234

1 What is the primary goal of Predictive Analytics?

A. To describe what happened in the past
B. To predict future outcomes based on historical data
C. To prescribe the best course of action
D. To store large amounts of data

2 Which of the following is NOT a phase in the standard predictive analytics lifecycle?

A. Data Preparation
B. Model Building
C. Hardware Manufacturing
D. Deployment

3 Descriptive Analytics differs from Predictive Analytics because Descriptive Analytics focuses on:

A. Forecasting future trends
B. Optimizing decision making
C. Summarizing past events
D. Creating machine learning models

4 Which of the following is a common application of Predictive Analytics?

A. Credit scoring
B. Generating an annual report
C. Real-time operating system scheduling
D. Creating a database schema

5 In the context of Machine Learning, what is 'Training Data'?

A. Data used to evaluate the final model
B. Data used to teach the algorithm patterns
C. Data that has been corrupted
D. Future data that has not occurred yet

6 Which type of analytics answers the question 'What should we do about it'?

A. Descriptive Analytics
B. Diagnostic Analytics
C. Predictive Analytics
D. Prescriptive Analytics

7 Machine Learning is best described as:

A. Hard-coding rules for every possible scenario
B. A subset of AI where computers learn from data without explicit programming
C. Strictly using statistical regression only
D. Data storage optimization

8 What is the 'Target Variable' in a predictive model?

A. The variable being predicted
B. The variable used to predict
C. The noise in the data
D. The index of the dataset

9 Which of the following is a type of Supervised Learning?

A. Clustering
B. Regression
C. Dimensionality Reduction
D. Association Rule Learning

10 In Supervised Learning, the dataset must contain:

A. Only input features
B. Labeled data (Input features and Output labels)
C. Unlabeled data
D. Only images

11 Unsupervised Learning differs from Supervised Learning because it:

A. Uses labeled data
B. Predicts a specific target
C. Finds hidden patterns in unlabeled data
D. Is only used for text data

12 Predicting the price of a house based on its square footage is an example of:

A. Classification
B. Regression
C. Clustering
D. Reinforcement Learning

13 Predicting whether an email is 'Spam' or 'Not Spam' is an example of:

A. Regression
B. Clustering
C. Classification
D. Principal Component Analysis

14 Which of the following is an Unsupervised Learning algorithm?

A. Linear Regression
B. K-Means Clustering
C. Logistic Regression
D. Decision Trees

15 What is the main objective of Clustering?

A. To predict a continuous value
B. To group similar data points together
C. To classify data into known categories
D. To reduce the number of rows in a table

16 Reinforcement Learning involves an agent that learns by:

A. Mimicking a teacher
B. Interacting with an environment and receiving rewards or penalties
C. Analyzing static clusters
D. Cleaning database records

17 Which of the following is a common issue where a model performs well on training data but poorly on new data?

A. Underfitting
B. Overfitting
C. Normalization
D. Clustering

18 What is the first step in Data Preprocessing?

A. Feature Scaling
B. Data Cleaning
C. Model Training
D. Hyperparameter Tuning

19 Garbage In, Garbage Out (GIGO) implies that:

A. We should delete all data
B. Poor quality input data leads to poor quality model output
C. More data always results in better models
D. Computer hardware needs regular cleaning

20 Which technique is used to handle missing values in a dataset?

A. Imputation
B. Overfitting
C. Clustering
D. Regression

21 What is the purpose of 'Feature Scaling'?

A. To increase the number of features
B. To bring all features to a similar scale or range
C. To remove missing values
D. To convert text to numbers

22 Standardization (Z-score normalization) transforms data to have:

A. A mean of 0 and standard deviation of 1
B. A range between 0 and 1
C. A mean of 100
D. No negative numbers

23 Which preprocessing technique is used to convert categorical variables into numerical format?

A. Scaling
B. Encoding
C. Imputation
D. Sampling

24 One-Hot Encoding helps in handling:

A. Missing values
B. Nominal categorical data
C. Outliers
D. Continuous variables

25 An outlier is defined as:

A. A value that is exactly the mean
B. A missing value
C. A data point that differs significantly from other observations
D. A categorical variable

26 Which method is commonly used to detect outliers?

A. Box Plot
B. Pie Chart
C. Confusion Matrix
D. Gradient Descent

27 Dimensionality Reduction aims to:

A. Increase the number of variables
B. Reduce the number of input variables while retaining important information
C. Remove all categorical variables
D. Create more rows in the dataset

28 PCA (Principal Component Analysis) is a technique used for:

A. Supervised Classification
B. Dimensionality Reduction
C. Reinforcement Learning
D. Data Imputation

29 Why do we split data into Training and Testing sets?

A. To make the dataset smaller
B. To evaluate the model's performance on unseen data
C. To train two different models
D. To remove outliers

30 Underfitting occurs when:

A. The model is too complex
B. The model captures noise
C. The model is too simple to capture the underlying structure of the data
D. The training data is perfect

31 Which of the following is NOT a data preprocessing step?

A. Feature Selection
B. Data Cleaning
C. Data Transformation
D. Hypothesis Testing

32 Min-Max Scaling transforms data into which range?

A. [-1, 1]
B. [0, 1]
C. [-infinity, +infinity]
D. [0, 100]

33 In a dataset, a row usually represents:

A. A feature
B. An observation or instance
C. A label
D. A statistical summary

34 Market Basket Analysis is an application of which learning type?

A. Regression
B. Association Rule Learning
C. Classification
D. Supervised Learning

35 Which of the following describes 'Feature Selection'?

A. Creating new features from existing ones
B. Selecting the most relevant features to improve model performance
C. Scaling features
D. Handling missing values

36 Noise in data refers to:

A. Missing values
B. Meaningless or random variance in the data
C. Categorical labels
D. Duplicate rows

37 Which of the following is a quantitative variable?

A. Gender (Male/Female)
B. Zip Code
C. Age
D. Color (Red/Blue)

38 Label Encoding is best used when:

A. The categorical feature has no order
B. The categorical feature is ordinal (has an inherent order)
C. There are missing values
D. The data is continuous

39 The process of converting raw data into a clean dataset is often called:

A. Data Mining
B. Data Wrangling/Munging
C. Data Visualization
D. Data Architecture

40 Which of these is a supervised learning algorithm used for classification?

A. K-Means
B. Logistic Regression
C. Apriori
D. PCA

41 If a dataset has duplicate records, the preprocessing step required is:

A. Imputation
B. Deduplication
C. Normalization
D. Encoding

42 The 'Curse of Dimensionality' refers to problems caused by:

A. Too many missing values
B. Too many features (variables) relative to the number of observations
C. Too much processing power
D. Inaccurate labels

43 Which variable type requires dummy variables creation during preprocessing?

A. Numerical
B. Categorical
C. Binary
D. Ordinal

44 Semi-supervised learning uses:

A. Only labeled data
B. Only unlabeled data
C. A small amount of labeled data and a large amount of unlabeled data
D. Reinforcement signals

45 What is the result of 'Data Transformation'?

A. Data is deleted
B. Data is converted into a format suitable for modeling
C. Data is visualized
D. Data is collected

46 Predicting the temperature for tomorrow is a:

A. Classification task
B. Regression task
C. Clustering task
D. Preprocessing task

47 Customer segmentation usually relies on which type of learning?

A. Supervised Learning
B. Unsupervised Learning (Clustering)
C. Reinforcement Learning
D. Regression

48 Which of the following represents 'Structured Data'?

A. Emails
B. Video files
C. Relational database tables
D. Audio recordings

49 Balanced data refers to:

A. Data where all values are the same
B. Data where the target classes are represented approximately equally
C. Data with no missing values
D. Data that has been scaled

50 Binning is a preprocessing technique used to:

A. Remove outliers
B. Convert continuous variables into categorical intervals
C. Fill missing values
D. Normalize data