1What is the primary objective of dimensionality reduction in predictive analytics?
A.To increase the number of features to capture more data
B.To reduce the number of input variables while retaining essential information
C.To create artificial data points for training
D.To increase the computational complexity of the model
Correct Answer: To reduce the number of input variables while retaining essential information
Explanation:Dimensionality reduction aims to reduce the number of random variables under consideration by obtaining a set of principal variables, thereby simplifying models and reducing computational cost.
Incorrect! Try again.
2Which phenomenon refers to the problem where the amount of data needed to generalize accurately grows exponentially with the dimensionality?
A.The Law of Large Numbers
B.The Vanishing Gradient
C.The Curse of Dimensionality
D.The Overfitting Paradox
Correct Answer: The Curse of Dimensionality
Explanation:The Curse of Dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces, often causing sparsity and distance convergence.
Incorrect! Try again.
3Principal Component Analysis (PCA) is best described as which type of machine learning technique?
A.Supervised Learning
B.Unsupervised Learning
C.Reinforcement Learning
D.Semisupervised Learning
Correct Answer: Unsupervised Learning
Explanation:PCA is an unsupervised technique because it analyzes the structure of the input data (correlations and variance) without reference to target labels.
Incorrect! Try again.
4In PCA, what do the Principal Components represent?
A.The original features sorted by importance
B.New orthogonal variables that maximize variance
C.The cluster centroids of the data
D.The error terms of a regression model
Correct Answer: New orthogonal variables that maximize variance
Explanation:Principal components are linear combinations of the original variables, constructed to be orthogonal to each other and to capture the maximum remaining variance in the data.
Incorrect! Try again.
5Which mathematical concept is central to calculating Principal Components in PCA?
A.Eigenvalues and Eigenvectors
B.Sine and Cosine functions
C.Derivatives and Integrals
D.Fourier Transforms
Correct Answer: Eigenvalues and Eigenvectors
Explanation:PCA typically involves computing the eigenvalues and eigenvectors of the data's covariance matrix to determine the magnitude and direction of variance.
Incorrect! Try again.
6Why is feature scaling (standardization) important before applying PCA?
A.Because PCA requires categorical data
B.Because PCA is sensitive to the scale of the variables
C.To convert negative numbers to positive
D.To increase the number of dimensions
Correct Answer: Because PCA is sensitive to the scale of the variables
Explanation:Since PCA maximizes variance, variables with large scales (e.g., salary vs. age) will dominate the principal components if the data is not standardized first.
Incorrect! Try again.
7What is the relationship between the first principal component and the second principal component?
A.They are parallel to each other
B.They are orthogonal (uncorrelated) to each other
C.They are identical
D.The second is the inverse of the first
Correct Answer: They are orthogonal (uncorrelated) to each other
Explanation:By definition, every principal component is orthogonal to the ones preceding it, ensuring that there is no redundant information (linear correlation) between them.
Incorrect! Try again.
8Which plot is commonly used to determine the optimal number of principal components to retain?
A.Scatter plot
B.Box plot
C.Scree plot
D.Histogram
Correct Answer: Scree plot
Explanation:A scree plot displays the eigenvalues (explained variance) for each principal component, helping analysts identify the 'elbow' point where adding more components yields diminishing returns.
Incorrect! Try again.
9What is the main difference between Feature Selection and Feature Extraction?
A.Feature selection creates new variables; Feature extraction selects a subset
B.Feature selection selects a subset of original variables; Feature extraction creates new variables
C.They are exactly the same
D.Feature selection is for images; Feature extraction is for text
Correct Answer: Feature selection selects a subset of original variables; Feature extraction creates new variables
Explanation:Selection keeps a subset of the original features (e.g., Lasso), while extraction (like PCA) projects original features into a new space, creating entirely new variables.
Incorrect! Try again.
10If a dataset has 50 variables and you apply PCA to select the top 5 components, what happens to the dimensionality?
A.It increases to 55
B.It remains 50
C.It reduces to 5
D.It becomes 0
Correct Answer: It reduces to 5
Explanation:The purpose of keeping only the top components is to reduce the dimensionality of the dataset from the original count (50) to the selected count (5).
Incorrect! Try again.
11What is a Feedforward Neural Network?
A.A network where information moves in only one direction, forward, from the input nodes
B.A network where output is fed back into the input
C.A network that only processes images
D.A network that does not use weights
Correct Answer: A network where information moves in only one direction, forward, from the input nodes
Explanation:In a feedforward neural network, connections between nodes do not form a cycle; data flows from input to hidden layers to output.
Incorrect! Try again.
12What is the fundamental building block of a neural network?
A.Pixel
B.Neuron (Perceptron)
C.Decision Tree
D.Eigenvector
Correct Answer: Neuron (Perceptron)
Explanation:The artificial neuron (or perceptron) is the basic unit that takes inputs, applies weights and a bias, and passes the result through an activation function.
Incorrect! Try again.
13In a neural network, what is the role of the 'weight'?
A.To store the input data
B.To represent the strength or importance of a connection between neurons
C.To determine the number of layers
D.To calculate the accuracy
Correct Answer: To represent the strength or importance of a connection between neurons
Explanation:Weights are learnable parameters that adjust the influence of input signals as they pass from one neuron to the next.
Incorrect! Try again.
14What is the purpose of an activation function in a neural network?
A.To initialize the weights to zero
B.To introduce non-linearity into the network
C.To reduce the size of the dataset
D.To sort the input data
Correct Answer: To introduce non-linearity into the network
Explanation:Without activation functions, a neural network would simply be a linear regression model. Activation functions allow the network to learn complex, non-linear patterns.
Incorrect! Try again.
15Which of the following is a commonly used activation function that outputs values between 0 and 1?
A.ReLU (Rectified Linear Unit)
B.Tanh
C.Sigmoid
D.Linear
Correct Answer: Sigmoid
Explanation:The Sigmoid function maps any input value to a value between 0 and 1, making it useful for probability estimation in binary classification.
Incorrect! Try again.
16What is a Multi-layer Perceptron (MLP)?
A.A perceptron with no weights
B.A neural network with at least one hidden layer between input and output
C.A network that only has an input and an output layer
D.A type of PCA
Correct Answer: A neural network with at least one hidden layer between input and output
Explanation:An MLP is a class of feedforward artificial neural network that consists of at least three layers of nodes: an input layer, a hidden layer, and an output layer.
Incorrect! Try again.
17What limitation of the single-layer perceptron did the MLP solve?
A.It was too slow
B.It could not solve the XOR problem (non-linearly separable data)
C.It required too much memory
D.It could not handle numbers
Correct Answer: It could not solve the XOR problem (non-linearly separable data)
Explanation:Single-layer perceptrons can only classify linearly separable data. MLPs, with hidden layers and non-linear activation functions, can solve non-linear problems like XOR.
Incorrect! Try again.
18What algorithm is commonly used to train Multi-layer Perceptrons?
A.K-Means Clustering
B.Apriori Algorithm
C.Backpropagation
D.Principal Component Analysis
Correct Answer: Backpropagation
Explanation:Backpropagation is the standard method for training neural networks. It computes the gradient of the loss function with respect to the weights.
Incorrect! Try again.
19In the context of Backpropagation, what is the role of the 'Loss Function'?
A.To increase the speed of the network
B.To measure the difference between the predicted output and the actual target
C.To randomly assign weights
D.To visualize the neural network
Correct Answer: To measure the difference between the predicted output and the actual target
Explanation:The loss function quantifies the error of the model. The goal of training is to minimize this loss.
Incorrect! Try again.
20What does 'Gradient Descent' do in a neural network?
A.It increases the number of neurons
B.It iteratively adjusts weights to minimize the loss function
C.It converts images to text
D.It removes outliers from data
Correct Answer: It iteratively adjusts weights to minimize the loss function
Explanation:Gradient descent is an optimization algorithm used to minimize the cost function by moving in the direction of the steepest descent (negative gradient).
Incorrect! Try again.
21What is an 'Epoch' in neural network training?
A.The number of hidden layers
B.One forward pass and one backward pass of all the training examples
C.The initial learning rate
D.The time it takes to code the model
Correct Answer: One forward pass and one backward pass of all the training examples
Explanation:An epoch defines one cycle through the full training dataset. Training usually involves multiple epochs.
Incorrect! Try again.
22Which activation function is defined as f(x) = max(0, x)?
A.Sigmoid
B.Tanh
C.ReLU (Rectified Linear Unit)
D.Softmax
Correct Answer: ReLU (Rectified Linear Unit)
Explanation:ReLU outputs the input directly if it is positive, otherwise, it outputs zero. It is widely used because it reduces the vanishing gradient problem.
Incorrect! Try again.
23What is the 'Bias' term in a neuron equation?
A.A prejudice in the data
B.A constant added to the product of inputs and weights to shift the activation function
C.The error rate of the model
D.The number of inputs
Correct Answer: A constant added to the product of inputs and weights to shift the activation function
Explanation:The bias allows the activation function to be shifted to the left or right, enabling the network to fit data that doesn't pass through the origin.
Incorrect! Try again.
24Convolutional Neural Networks (CNNs) are primarily used for which type of data?
A.Time-series financial data
B.Tabular sales data
C.Image and video data
D.Text sentiment analysis
Correct Answer: Image and video data
Explanation:CNNs are designed to process grid-like data topology, making them highly effective for image recognition and computer vision tasks.
Incorrect! Try again.
25What is the core operation in a CNN that allows it to detect features like edges?
A.Multiplication
B.Convolution
C.Recursion
D.Flattening
Correct Answer: Convolution
Explanation:Convolution involves sliding a filter (kernel) over the input image to create a feature map, highlighting specific patterns like edges or textures.
Incorrect! Try again.
26In a CNN, what is a 'Kernel' or 'Filter'?
A.A virus protection software
B.A small matrix of weights that slides over the input
C.The output layer of the network
D.The loss function
Correct Answer: A small matrix of weights that slides over the input
Explanation:Filters are the learnable parameters in a CNN. They are small matrices that convolve with the input to detect specific features.
Incorrect! Try again.
27What is the purpose of 'Pooling' layers (e.g., Max Pooling) in a CNN?
A.To increase the dimensions of the image
B.To add color to the image
C.To reduce the spatial dimensions and computation while retaining important features
D.To inverse the colors
Correct Answer: To reduce the spatial dimensions and computation while retaining important features
Explanation:Pooling downsamples the feature maps (e.g., taking the maximum value in a 2x2 grid), reducing parameters and making the model invariant to small translations.
Incorrect! Try again.
28What does 'Stride' refer to in the context of CNNs?
A.The number of filters used
B.The step size the filter moves across the input image
C.The size of the pooling window
D.The learning rate
Correct Answer: The step size the filter moves across the input image
Explanation:Stride controls how many pixels the filter shifts at a time. A larger stride results in a smaller output feature map.
Incorrect! Try again.
29What is 'Padding' in a CNN?
A.Removing pixels from the image
B.Adding pixels (usually zeros) around the border of the input image
C.Compressing the image file
D.Coloring the image black and white
Correct Answer: Adding pixels (usually zeros) around the border of the input image
Explanation:Padding is used to control the spatial size of the output volume (usually to keep it the same as the input) and allows processing of border pixels.
Incorrect! Try again.
30Before passing the output of convolutional layers to a fully connected dense layer, what operation must be performed?
A.Flattening
B.Expanding
C.Rotating
D.Inverting
Correct Answer: Flattening
Explanation:Convolutional layers output 3D volumes (height, width, channels), but dense layers require 1D vectors. Flattening converts the volume into a vector.
Incorrect! Try again.
31Recurrent Neural Networks (RNNs) are best suited for what type of data?
A.Static images
B.Independent tabular records
C.Sequential data (e.g., time series, text)
D.Unstructured noise
Correct Answer: Sequential data (e.g., time series, text)
Explanation:RNNs are designed to handle sequences where the current output depends on previous computations, making them ideal for time series and natural language.
Incorrect! Try again.
32What distinguishes an RNN from a standard Feedforward Neural Network?
A.RNNs have no activation functions
B.RNNs have a feedback loop allowing information to persist
C.RNNs only work on GPUs
D.RNNs cannot have hidden layers
Correct Answer: RNNs have a feedback loop allowing information to persist
Explanation:RNNs have loops in them, allowing information to be passed from one step of the network to the next, effectively creating a form of 'memory'.
Incorrect! Try again.
33In an RNN, what is the 'Hidden State'?
A.A layer that is never trained
B.The memory of the network capturing information about previous steps
C.Data that is deleted after processing
D.The final output prediction
Correct Answer: The memory of the network capturing information about previous steps
Explanation:The hidden state is a vector calculated at each time step based on the current input and the previous hidden state, acting as the network's memory.
Incorrect! Try again.
34What is the 'Vanishing Gradient Problem' commonly faced by standard RNNs?
A.Gradients become too large and explode
B.Gradients become extremely small, preventing weights from updating effectively in earlier layers
C.The network runs out of memory
D.The loss function becomes zero immediately
Correct Answer: Gradients become extremely small, preventing weights from updating effectively in earlier layers
Explanation:During backpropagation through time, gradients can shrink exponentially, making it difficult for the RNN to learn long-term dependencies.
Incorrect! Try again.
35Which architecture was designed specifically to solve the Vanishing Gradient problem in RNNs?
A.CNN (Convolutional Neural Network)
B.LSTM (Long Short-Term Memory)
C.PCA (Principal Component Analysis)
D.Perceptron
Correct Answer: LSTM (Long Short-Term Memory)
Explanation:LSTMs introduce gates (input, output, forget) that regulate the flow of information, allowing the network to retain information over long sequences.
Incorrect! Try again.
36What is 'Dropout' in the context of Neural Networks?
A.Stopping the training early
B.A regularization technique that randomly ignores neurons during training to prevent overfitting
C.Removing bad data from the input
D.A type of activation function
Correct Answer: A regularization technique that randomly ignores neurons during training to prevent overfitting
Explanation:Dropout randomly sets a fraction of input units to 0 at each update during training time, which helps prevent neurons from co-adapting too much (overfitting).
Incorrect! Try again.
37In a multi-class classification problem (e.g., classifying an image as a Cat, Dog, or Bird), which activation function is used in the output layer?
A.Linear
B.Sigmoid
C.Softmax
D.ReLU
Correct Answer: Softmax
Explanation:Softmax converts a vector of numbers into a vector of probabilities that sum to 1, making it suitable for mutually exclusive multi-class classification.
Incorrect! Try again.
38Which of the following is a hyperparameter in a neural network?
A.Weights
B.Biases
C.Learning Rate
D.Output predictions
Correct Answer: Learning Rate
Explanation:Hyperparameters are set before training begins (e.g., learning rate, number of layers), whereas weights and biases are parameters learned during training.
Incorrect! Try again.
39What does a Learning Rate of 0.001 imply?
A.The model will learn very fast
B.The weights are updated by a small step size in the direction of the gradient
C.The accuracy will be 0.1%
D.The model has 1000 layers
Correct Answer: The weights are updated by a small step size in the direction of the gradient
Explanation:The learning rate controls the magnitude of the update to the model weights. A small rate ensures stable convergence but may take longer.
Incorrect! Try again.
40In the context of PCA, if the first two principal components explain 95% of the variance, what can be concluded?
A.The data is useless
B.The other components contain 95% of the information
C.The data can be effectively reduced to 2 dimensions with minimal information loss
D.You need at least 10 components
Correct Answer: The data can be effectively reduced to 2 dimensions with minimal information loss
Explanation:High cumulative explained variance in the first few components indicates that the dimensionality can be reduced significantly while preserving the data's structure.
Incorrect! Try again.
41Which concept allows a CNN to recognize an object in an image regardless of where it is located in the image?
A.Translation Invariance
B.Vanishing Gradient
C.Overfitting
D.Linearity
Correct Answer: Translation Invariance
Explanation:Due to weight sharing in convolution and the abstraction of pooling, CNNs can detect features (like an eye or a wheel) regardless of their spatial position.
Incorrect! Try again.
42What is the 'Gate' mechanism in LSTM used for?
A.To connect to the internet
B.To control what information is added to or removed from the cell state
C.To calculate the convolution
D.To speed up the GPU
Correct Answer: To control what information is added to or removed from the cell state
Explanation:LSTMs use Forget, Input, and Output gates (typically sigmoid layers) to regulate the cell state (memory).
Incorrect! Try again.
43Which of the following is an example of a Many-to-One RNN architecture?
A.Machine Translation (Sequence to Sequence)
B.Sentiment Analysis (Text to Sentiment Score)
C.Image Captioning (Image to Text)
D.Video Frame Prediction
Correct Answer: Sentiment Analysis (Text to Sentiment Score)
Explanation:In sentiment analysis, a sequence of words (Many) is input to the network, and a single classification (One) (e.g., Positive/Negative) is output.
Incorrect! Try again.
44In a fully connected (dense) layer, how are neurons connected?
A.Only to the corresponding neuron in the previous layer
B.Each neuron is connected to every neuron in the previous layer
C.Neurons are not connected
D.Randomly connected
Correct Answer: Each neuron is connected to every neuron in the previous layer
Explanation:A dense layer is 'fully connected,' meaning every input node connects to every output node of that layer with a unique weight.
Incorrect! Try again.
45Why is Non-Linearity important in dimensionality reduction (e.g., t-SNE vs PCA)?
A.It is not important
B.PCA captures linear relationships; non-linear methods capture complex manifolds
Explanation:PCA projects data linearly. If the data lies on a curved surface (manifold), linear methods fail to unfold it, requiring non-linear techniques.
Incorrect! Try again.
46What is the primary disadvantage of using a Deep Neural Network compared to a Decision Tree?
A.Lower accuracy
B.Lack of interpretability (Black Box nature)
C.Cannot handle large data
D.Simplicity
Correct Answer: Lack of interpretability (Black Box nature)
Explanation:Neural networks, especially deep ones, act as black boxes where it is difficult to trace exactly how a specific input leads to a specific output, unlike the clear rules of a decision tree.
Incorrect! Try again.
47In PCA, the Covariance Matrix is usually:
A.Asymmetric
B.Symmetric
C.Diagonal with zeros
D.Undefined
Correct Answer: Symmetric
Explanation:The covariance matrix measures the covariance between every pair of variables. Since Cov(X,Y) = Cov(Y,X), the matrix is symmetric.
Incorrect! Try again.
48What is 'Weight Initialization'?
A.Setting all weights to 1
B.Setting all weights to 0
C.Setting initial values for weights (often small random numbers) before training
D.Calculating the final weights
Correct Answer: Setting initial values for weights (often small random numbers) before training
Explanation:Proper initialization is crucial. If all weights are zero, neurons learn the same features. Random small numbers break symmetry.
Incorrect! Try again.
49Which of the following best describes 'Overfitting' in neural networks?
A.The model performs poorly on training data and test data
B.The model performs well on training data but poorly on unseen test data
C.The model performs well on test data but poorly on training data
D.The model has too few parameters
Correct Answer: The model performs well on training data but poorly on unseen test data
Explanation:Overfitting occurs when the network memorizes the noise and details of the training set rather than learning the generalizable patterns.
Incorrect! Try again.
50The Universal Approximation Theorem states that:
A.A neural network can never reach 100% accuracy
B.A feedforward network with a single hidden layer can approximate any continuous function
C.CNNs are better than RNNs
D.PCA is universal
Correct Answer: A feedforward network with a single hidden layer can approximate any continuous function
Explanation:This theorem provides the theoretical foundation for neural networks, stating that given enough neurons in a hidden layer, an MLP can approximate any continuous function.
Incorrect! Try again.
Give Feedback
Help us improve by sharing your thoughts or reporting issues.