Unit 1 - Practice Quiz

INT234 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the primary goal of Predictive Analytics?

A. To describe what happened in the past
B. To prescribe the best course of action
C. To predict future outcomes based on historical data
D. To store large amounts of data

2 Which of the following is NOT a phase in the standard predictive analytics lifecycle?

A. Deployment
B. Data Preparation
C. Hardware Manufacturing
D. Model Building

3 Descriptive Analytics differs from Predictive Analytics because Descriptive Analytics focuses on:

A. Forecasting future trends
B. Optimizing decision making
C. Summarizing past events
D. Creating machine learning models

4 Which of the following is a common application of Predictive Analytics?

A. Generating an annual report
B. Creating a database schema
C. Real-time operating system scheduling
D. Credit scoring

5 In the context of Machine Learning, what is 'Training Data'?

A. Data used to evaluate the final model
B. Data used to teach the algorithm patterns
C. Future data that has not occurred yet
D. Data that has been corrupted

6 Which type of analytics answers the question 'What should we do about it'?

A. Diagnostic Analytics
B. Predictive Analytics
C. Prescriptive Analytics
D. Descriptive Analytics

7 Machine Learning is best described as:

A. A subset of AI where computers learn from data without explicit programming
B. Strictly using statistical regression only
C. Data storage optimization
D. Hard-coding rules for every possible scenario

8 What is the 'Target Variable' in a predictive model?

A. The noise in the data
B. The variable used to predict
C. The variable being predicted
D. The index of the dataset

9 Which of the following is a type of Supervised Learning?

A. Clustering
B. Association Rule Learning
C. Regression
D. Dimensionality Reduction

10 In Supervised Learning, the dataset must contain:

A. Only images
B. Unlabeled data
C. Only input features
D. Labeled data (Input features and Output labels)

11 Unsupervised Learning differs from Supervised Learning because it:

A. Predicts a specific target
B. Is only used for text data
C. Uses labeled data
D. Finds hidden patterns in unlabeled data

12 Predicting the price of a house based on its square footage is an example of:

A. Clustering
B. Reinforcement Learning
C. Regression
D. Classification

13 Predicting whether an email is 'Spam' or 'Not Spam' is an example of:

A. Principal Component Analysis
B. Regression
C. Clustering
D. Classification

14 Which of the following is an Unsupervised Learning algorithm?

A. Linear Regression
B. Decision Trees
C. K-Means Clustering
D. Logistic Regression

15 What is the main objective of Clustering?

A. To reduce the number of rows in a table
B. To classify data into known categories
C. To predict a continuous value
D. To group similar data points together

16 Reinforcement Learning involves an agent that learns by:

A. Cleaning database records
B. Analyzing static clusters
C. Mimicking a teacher
D. Interacting with an environment and receiving rewards or penalties

17 Which of the following is a common issue where a model performs well on training data but poorly on new data?

A. Normalization
B. Clustering
C. Overfitting
D. Underfitting

18 What is the first step in Data Preprocessing?

A. Hyperparameter Tuning
B. Model Training
C. Feature Scaling
D. Data Cleaning

19 Garbage In, Garbage Out (GIGO) implies that:

A. We should delete all data
B. Computer hardware needs regular cleaning
C. Poor quality input data leads to poor quality model output
D. More data always results in better models

20 Which technique is used to handle missing values in a dataset?

A. Regression
B. Overfitting
C. Clustering
D. Imputation

21 What is the purpose of 'Feature Scaling'?

A. To increase the number of features
B. To bring all features to a similar scale or range
C. To convert text to numbers
D. To remove missing values

22 Standardization (Z-score normalization) transforms data to have:

A. A mean of 0 and standard deviation of 1
B. A range between 0 and 1
C. No negative numbers
D. A mean of 100

23 Which preprocessing technique is used to convert categorical variables into numerical format?

A. Imputation
B. Encoding
C. Scaling
D. Sampling

24 One-Hot Encoding helps in handling:

A. Nominal categorical data
B. Outliers
C. Missing values
D. Continuous variables

25 An outlier is defined as:

A. A categorical variable
B. A value that is exactly the mean
C. A missing value
D. A data point that differs significantly from other observations

26 Which method is commonly used to detect outliers?

A. Box Plot
B. Confusion Matrix
C. Pie Chart
D. Gradient Descent

27 Dimensionality Reduction aims to:

A. Create more rows in the dataset
B. Increase the number of variables
C. Remove all categorical variables
D. Reduce the number of input variables while retaining important information

28 PCA (Principal Component Analysis) is a technique used for:

A. Supervised Classification
B. Data Imputation
C. Reinforcement Learning
D. Dimensionality Reduction

29 Why do we split data into Training and Testing sets?

A. To train two different models
B. To make the dataset smaller
C. To evaluate the model's performance on unseen data
D. To remove outliers

30 Underfitting occurs when:

A. The model is too simple to capture the underlying structure of the data
B. The model captures noise
C. The model is too complex
D. The training data is perfect

31 Which of the following is NOT a data preprocessing step?

A. Data Cleaning
B. Data Transformation
C. Hypothesis Testing
D. Feature Selection

32 Min-Max Scaling transforms data into which range?

A. [0, 100]
B. [-infinity, +infinity]
C. [-1, 1]
D. [0, 1]

33 In a dataset, a row usually represents:

A. A label
B. A statistical summary
C. An observation or instance
D. A feature

34 Market Basket Analysis is an application of which learning type?

A. Regression
B. Classification
C. Association Rule Learning
D. Supervised Learning

35 Which of the following describes 'Feature Selection'?

A. Scaling features
B. Creating new features from existing ones
C. Selecting the most relevant features to improve model performance
D. Handling missing values

36 Noise in data refers to:

A. Missing values
B. Categorical labels
C. Meaningless or random variance in the data
D. Duplicate rows

37 Which of the following is a quantitative variable?

A. Color (Red/Blue)
B. Age
C. Zip Code
D. Gender (Male/Female)

38 Label Encoding is best used when:

A. The categorical feature has no order
B. There are missing values
C. The categorical feature is ordinal (has an inherent order)
D. The data is continuous

39 The process of converting raw data into a clean dataset is often called:

A. Data Mining
B. Data Architecture
C. Data Wrangling/Munging
D. Data Visualization

40 Which of these is a supervised learning algorithm used for classification?

A. K-Means
B. Apriori
C. Logistic Regression
D. PCA

41 If a dataset has duplicate records, the preprocessing step required is:

A. Imputation
B. Deduplication
C. Encoding
D. Normalization

42 The 'Curse of Dimensionality' refers to problems caused by:

A. Too many features (variables) relative to the number of observations
B. Too much processing power
C. Too many missing values
D. Inaccurate labels

43 Which variable type requires dummy variables creation during preprocessing?

A. Categorical
B. Numerical
C. Binary
D. Ordinal

44 Semi-supervised learning uses:

A. Only labeled data
B. A small amount of labeled data and a large amount of unlabeled data
C. Only unlabeled data
D. Reinforcement signals

45 What is the result of 'Data Transformation'?

A. Data is visualized
B. Data is deleted
C. Data is collected
D. Data is converted into a format suitable for modeling

46 Predicting the temperature for tomorrow is a:

A. Clustering task
B. Regression task
C. Preprocessing task
D. Classification task

47 Customer segmentation usually relies on which type of learning?

A. Unsupervised Learning (Clustering)
B. Supervised Learning
C. Regression
D. Reinforcement Learning

48 Which of the following represents 'Structured Data'?

A. Video files
B. Audio recordings
C. Emails
D. Relational database tables

49 Balanced data refers to:

A. Data that has been scaled
B. Data where all values are the same
C. Data where the target classes are represented approximately equally
D. Data with no missing values

50 Binning is a preprocessing technique used to:

A. Normalize data
B. Convert continuous variables into categorical intervals
C. Fill missing values
D. Remove outliers