1 $What is the primary objective of dimensionality reduction in predictive analytics?$

A.

To increase the number of features to capture more data

B.

To reduce the number of input variables while retaining essential information

C.

To create artificial data points for training

D.

To increase the computational complexity of the model

2 $Which phenomenon refers to the problem where the amount of data needed to generalize accurately grows exponentially with the dimensionality?$

A.

The Law of Large Numbers

B.

The Vanishing Gradient

C.

The Curse of Dimensionality

D.

The Overfitting Paradox

3 $Principal Component Analysis (PCA) is best described as which type of machine learning technique?$

A.

Supervised Learning

B.

Unsupervised Learning

C.

Reinforcement Learning

D.

Semisupervised Learning

4 $In PCA, what do the Principal Components represent?$

A.

The original features sorted by importance

B.

New orthogonal variables that maximize variance

C.

The cluster centroids of the data

D.

The error terms of a regression model

5 $Which mathematical concept is central to calculating Principal Components in PCA?$

A.

Eigenvalues and Eigenvectors

B.

Sine and Cosine functions

C.

Derivatives and Integrals

D.

Fourier Transforms

6 $Why is feature scaling (standardization) important before applying PCA?$

A.

Because PCA requires categorical data

B.

Because PCA is sensitive to the scale of the variables

C.

To convert negative numbers to positive

D.

To increase the number of dimensions

7 $What is the relationship between the first principal component and the second principal component?$

A.

They are parallel to each other

B.

They are orthogonal (uncorrelated) to each other

C.

They are identical

D.

The second is the inverse of the first

8 $Which plot is commonly used to determine the optimal number of principal components to retain?$

A.

Scatter plot

B.

Box plot

C.

Scree plot

D.

Histogram

9 $What is the main difference between Feature Selection and Feature Extraction?$

A.

Feature selection creates new variables; Feature extraction selects a subset

B.

Feature selection selects a subset of original variables; Feature extraction creates new variables

C.

They are exactly the same

D.

Feature selection is for images; Feature extraction is for text

10 $If a dataset has 50 variables and you apply PCA to select the top 5 components, what happens to the dimensionality?$

A.

It increases to 55

B.

It remains 50

C.

It reduces to 5

D.

It becomes 0

11 $What is a Feedforward Neural Network?$

A.

A network where information moves in only one direction, forward, from the input nodes

B.

A network where output is fed back into the input

C.

A network that only processes images

D.

A network that does not use weights

12 $What is the fundamental building block of a neural network?$

A.

Pixel

B.

Neuron (Perceptron)

C.

Decision Tree

D.

Eigenvector

13 $In a neural network, what is the role of the 'weight'?$

A.

To store the input data

B.

To represent the strength or importance of a connection between neurons

C.

To determine the number of layers

D.

To calculate the accuracy

14 $What is the purpose of an activation function in a neural network?$

A.

To initialize the weights to zero

B.

To introduce non-linearity into the network

C.

To reduce the size of the dataset

D.

To sort the input data

15 $Which of the following is a commonly used activation function that outputs values between 0 and 1?$

A.

ReLU (Rectified Linear Unit)

B.

Tanh

C.

Sigmoid

D.

Linear

16 $What is a Multi-layer Perceptron (MLP)?$

A.

A perceptron with no weights

B.

A neural network with at least one hidden layer between input and output

C.

A network that only has an input and an output layer

D.

A type of PCA

17 $What limitation of the single-layer perceptron did the MLP solve?$

A.

It was too slow

B.

It could not solve the XOR problem (non-linearly separable data)

C.

It required too much memory

D.

It could not handle numbers

18 $What algorithm is commonly used to train Multi-layer Perceptrons?$

A.

K-Means Clustering

B.

Apriori Algorithm

C.

Backpropagation

D.

Principal Component Analysis

19 $In the context of Backpropagation, what is the role of the 'Loss Function'?$

A.

To increase the speed of the network

B.

To measure the difference between the predicted output and the actual target

C.

To randomly assign weights

D.

To visualize the neural network

20 $What does 'Gradient Descent' do in a neural network?$

A.

It increases the number of neurons

B.

It iteratively adjusts weights to minimize the loss function

C.

It converts images to text

D.

It removes outliers from data

21 $What is an 'Epoch' in neural network training?$

A.

The number of hidden layers

B.

One forward pass and one backward pass of all the training examples

C.

The initial learning rate

D.

The time it takes to code the model

22 $Which activation function is defined as f(x) = max(0, x)?$

A.

Sigmoid

B.

Tanh

C.

ReLU (Rectified Linear Unit)

D.

Softmax

23 $What is the 'Bias' term in a neuron equation?$

A.

A prejudice in the data

B.

A constant added to the product of inputs and weights to shift the activation function

C.

The error rate of the model

D.

The number of inputs

24 $Convolutional Neural Networks (CNNs) are primarily used for which type of data?$

A.

Time-series financial data

B.

Tabular sales data

C.

Image and video data

D.

Text sentiment analysis

25 $What is the core operation in a CNN that allows it to detect features like edges?$

A.

Multiplication

B.

Convolution

C.

Recursion

D.

Flattening

26 $In a CNN, what is a 'Kernel' or 'Filter'?$

A.

A virus protection software

B.

A small matrix of weights that slides over the input

C.

The output layer of the network

D.

The loss function

27 $What is the purpose of 'Pooling' layers (e.g., Max Pooling) in a CNN?$

A.

To increase the dimensions of the image

B.

To add color to the image

C.

To reduce the spatial dimensions and computation while retaining important features

D.

To inverse the colors

28 $What does 'Stride' refer to in the context of CNNs?$

A.

The number of filters used

B.

The step size the filter moves across the input image

C.

The size of the pooling window

D.

The learning rate

29 $What is 'Padding' in a CNN?$

A.

Removing pixels from the image

B.

Adding pixels (usually zeros) around the border of the input image

C.

Compressing the image file

D.

Coloring the image black and white

30 $Before passing the output of convolutional layers to a fully connected dense layer, what operation must be performed?$

A.

Flattening

B.

Expanding

C.

Rotating

D.

Inverting

31 $Recurrent Neural Networks (RNNs) are best suited for what type of data?$

A.

Static images

B.

Independent tabular records

C.

Sequential data (e.g., time series, text)

D.

Unstructured noise

32 $What distinguishes an RNN from a standard Feedforward Neural Network?$

A.

RNNs have no activation functions

B.

RNNs have a feedback loop allowing information to persist

C.

RNNs only work on GPUs

D.

RNNs cannot have hidden layers

33 $In an RNN, what is the 'Hidden State'?$

A.

A layer that is never trained

B.

The memory of the network capturing information about previous steps

C.

Data that is deleted after processing

D.

The final output prediction

34 $What is the 'Vanishing Gradient Problem' commonly faced by standard RNNs?$

A.

Gradients become too large and explode

B.

Gradients become extremely small, preventing weights from updating effectively in earlier layers

C.

The network runs out of memory

D.

The loss function becomes zero immediately

35 $Which architecture was designed specifically to solve the Vanishing Gradient problem in RNNs?$

A.

CNN (Convolutional Neural Network)

B.

LSTM (Long Short-Term Memory)

C.

PCA (Principal Component Analysis)

D.

Perceptron

36 $What is 'Dropout' in the context of Neural Networks?$

A.

Stopping the training early

B.

A regularization technique that randomly ignores neurons during training to prevent overfitting

C.

Removing bad data from the input

D.

A type of activation function

37 $In a multi-class classification problem (e.g., classifying an image as a Cat, Dog, or Bird), which activation function is used in the output layer?$

A.

Linear

B.

Sigmoid

C.

Softmax

D.

ReLU

38 $Which of the following is a hyperparameter in a neural network?$

A.

Weights

B.

Biases

C.

Learning Rate

D.

Output predictions

39 $What does a Learning Rate of 0.001 imply?$

A.

The model will learn very fast

B.

The weights are updated by a small step size in the direction of the gradient

C.

The accuracy will be 0.1%

D.

The model has 1000 layers

40 $In the context of PCA, if the first two principal components explain 95% of the variance, what can be concluded?$

A.

The data is useless

B.

The other components contain 95% of the information

C.

The data can be effectively reduced to 2 dimensions with minimal information loss

D.

You need at least 10 components

41 $Which concept allows a CNN to recognize an object in an image regardless of where it is located in the image?$

A.

Translation Invariance

B.

Vanishing Gradient

C.

Overfitting

D.

Linearity

42 $What is the 'Gate' mechanism in LSTM used for?$

A.

To connect to the internet

B.

To control what information is added to or removed from the cell state

C.

To calculate the convolution

D.

To speed up the GPU

43 $Which of the following is an example of a Many-to-One RNN architecture?$

A.

Machine Translation (Sequence to Sequence)

B.

Sentiment Analysis (Text to Sentiment Score)

C.

Image Captioning (Image to Text)

D.

Video Frame Prediction

44 $In a fully connected (dense) layer, how are neurons connected?$

A.

Only to the corresponding neuron in the previous layer

B.

Each neuron is connected to every neuron in the previous layer

C.

Neurons are not connected

D.

Randomly connected

45 $Why is Non-Linearity important in dimensionality reduction (e.g., t-SNE vs PCA)?$

A.

It is not important

B.

PCA captures linear relationships; non-linear methods capture complex manifolds

C.

Non-linear methods are always faster

D.

PCA cannot handle numbers

46 $What is the primary disadvantage of using a Deep Neural Network compared to a Decision Tree?$

A.

Lower accuracy

B.

Lack of interpretability (Black Box nature)

C.

Cannot handle large data

D.

Simplicity

47 $In PCA, the Covariance Matrix is usually:$

A.

Asymmetric

B.

Symmetric

C.

Diagonal with zeros

D.

Undefined

48 $What is 'Weight Initialization'?$

A.

Setting all weights to 1

B.

Setting all weights to 0

C.

Setting initial values for weights (often small random numbers) before training

D.

Calculating the final weights

49 $Which of the following best describes 'Overfitting' in neural networks?$

A.

The model performs poorly on training data and test data

B.

The model performs well on training data but poorly on unseen test data

C.

The model performs well on test data but poorly on training data

D.

The model has too few parameters

50 $The Universal Approximation Theorem states that:$

A.

A neural network can never reach 100% accuracy

B.

A feedforward network with a single hidden layer can approximate any continuous function

C.

CNNs are better than RNNs

D.

PCA is universal

Unit 5 - Practice Quiz