1 $What is the fundamental building block of an Artificial Neural Network, inspired by the biological neuron?$

A.

Kernel

B.

Perceptron

C.

Transformer

D.

Token

2 $Mathematically, the output of a single perceptron with inputs, weights, bias, and activation function is represented as:$

A.

B.

C.

D.

3 $In a Multi-Layer Perceptron (MLP), what is the primary purpose of the activation function?$

A.

To reset the weights to zero

B.

To introduce non-linearity into the network

C.

To convert the input data into images

D.

To increase the size of the dataset

4 $Which algorithm is commonly used to train neural networks by calculating the gradient of the loss function with respect to the weights?$

A.

K-Means Clustering

B.

Forward Propagation

C.

Backpropagation

D.

Principal Component Analysis

5 $What is the primary advantage of a Convolutional Neural Network (CNN) over a standard MLP for image processing?$

A.

CNNs do not require activation functions

B.

CNNs capture spatial hierarchies and local patterns using shared weights

C.

CNNs can only process text data

D.

CNNs do not use backpropagation

6 $In a CNN, what is the function of a Pooling Layer ?$

A.

To increase the dimensionality of the feature map

B.

To reduce the spatial dimensions and computation parameters

C.

To change the color of the image

D.

To classify the final output

7 $Recurrent Neural Networks (RNNs) are specifically designed to handle which type of data?$

A.

Static images

B.

Tabular data with no time component

C.

Sequential data (e.g., time series, text)

D.

Unstructured noise

8 $What major issue often affects standard RNNs when training on long sequences?$

A.

Exploding biases

B.

Vanishing Gradient Problem

C.

Over-tokenization

D.

Image saturation

9 $Which architecture was introduced to solve the short-term memory limitation of standard RNNs?$

A.

Perceptron

B.

LSTM (Long Short-Term Memory)

C.

Random Forest

D.

SVM

10 $What is the core innovation of the Transformer architecture introduced in the paper 'Attention Is All You Need'?$

A.

Recurrence mechanism

B.

Convolutional filters

C.

Self-Attention mechanism

D.

Gradient boosting

11 $In the context of Transformers, what does the Self-Attention mechanism calculate?$

A.

The probability of the next word based on the previous word only

B.

The relevance of each word in a sequence to every other word in the same sequence

C.

The pixel intensity of an image

D.

The grammatical correctness of a sentence

12 $Which of the following is a key advantage of Transformers over RNNs regarding training?$

A.

Transformers require less data

B.

Transformers process data sequentially, making them slower

C.

Transformers allow for parallelization of data processing

D.

Transformers cannot handle long sequences

13 $What is NLP in the context of Artificial Intelligence?$

A.

Neural Linear Processing

B.

Natural Language Processing

C.

Network Level Protocol

D.

Native Language Programming

14 $Which phase of NLP involves breaking down a text paragraph into smaller units like sentences or words?$

A.

Tokenization

B.

Stemming

C.

Sentiment Analysis

D.

Discourse Analysis

15 $In NLP preprocessing, what does Stemming refer to?$

A.

Removing stop words like 'and' or 'the'

B.

Reducing words to their root form by chopping off affixes (e.g., 'running' -> 'run')

C.

Converting text to vectors

D.

Identifying proper nouns

16 $How does Lemmatization differ from Stemming?$

A.

Lemmatization is faster but less accurate

B.

Lemmatization uses a vocabulary and morphological analysis to return the dictionary base form

C.

Lemmatization only works on vowels

D.

There is no difference

17 $What are Stop Words in NLP?$

A.

Words that end a sentence

B.

Common words (e.g., 'is', 'the', 'at') often removed during preprocessing because they carry little unique meaning

C.

Words that stop the training process

D.

Keywords that trigger a chatbot action

18 $What is the definition of Word Embeddings ?$

A.

A dictionary definition of a word

B.

A dense vector representation of words where similar words have similar vector values

C.

A method to bold words in a document

D.

The count of how many times a word appears

19 $In a vector space model (like Word2Vec), which arithmetic operation captures the relationship:$

A.

Prince

B.

Queen

C.

Princess

D.

Monarch

20 $What is BERT (Bidirectional Encoder Representations from Transformers) primarily used for?$

A.

Generating images from text

B.

Understanding the context of words in a sentence by looking in both directions

C.

Playing chess

D.

Only predicting the next word in a sequence

21 $Which architecture does the GPT (Generative Pre-trained Transformer) family of models primarily use?$

A.

Transformer Encoder only

B.

Transformer Decoder only

C.

CNN-RNN Hybrid

D.

Random Forest

22 $What is Sentiment Analysis ?$

A.

Translating text from English to French

B.

Determining the emotional tone (positive, negative, neutral) behind a text

C.

Summarizing a long article

D.

Checking for spelling errors

23 $Which NLP task involves converting a long document into a shorter version while retaining key information?$

A.

Text Classification

B.

Text Summarization

C.

Named Entity Recognition

D.

Machine Translation

24 $What is Abstractive Summarization ?$

A.

Selecting specific sentences from the original text

B.

Generating new sentences to summarize the content, potentially using words not in the original text

C.

Removing vowels from the text

D.

Highlighting keywords in a PDF

25 $In the context of chatbots, what is an Intent ?$

A.

The specific detail provided by the user (e.g., a date or city)

B.

The user's goal or purpose behind a specific input (e.g., 'Book a flight')

C.

The database used to store logs

D.

The programming language of the bot

26 $In the context of chatbots, what is an Entity ?$

A.

The background color of the chat interface

B.

Specific pieces of information inside the user's input (e.g., 'New York', 'Tomorrow')

C.

The machine learning model used

D.

The sentiment of the user

27 $Which concept explains why Deep Learning performs better than traditional Machine Learning on unstructured data like images and text?$

A.

Manual feature extraction

B.

Automatic feature extraction / Representation learning

C.

Smaller datasets

D.

Simpler algorithms

28 $What is the Softmax function typically used for in the output layer of a neural network?$

A.

Binary classification (0 or 1)

B.

Multi-class classification (generating a probability distribution over classes)

C.

Regression (predicting a continuous value)

D.

Reducing the image size

29 $Which of the following describes Named Entity Recognition (NER) ?$

A.

Classifying an email as spam or ham

B.

Identifying and classifying key information in text into categories like names, organizations, locations, etc.

C.

Translating a sentence to Spanish

D.

Converting speech to text

30 $What is Fine-tuning in the context of Large Language Models (LLMs)?$

A.

Training a model from scratch with random weights

B.

Taking a pre-trained model and training it further on a specific dataset for a specific task

C.

Adjusting the monitor brightness

D.

Reducing the vocabulary size

31 $What is the role of Positional Encoding in Transformer models?$

A.

It encrypts the data for security

B.

It injects information about the position of tokens in the sequence since Transformers process data in parallel

C.

It determines the language of the text

D.

It converts images to text

32 $Which term describes the phenomenon where an LLM generates incorrect or nonsensical information confidently?$

A.

Backpropagation

B.

Hallucination

C.

Tokenization

D.

Regularization

33 $What is the Bag of Words (BoW) model?$

A.

A model that keeps the exact order of words

B.

A representation of text that describes the occurrence of words within a document, disregarding grammar and order

C.

A deep learning model for image recognition

D.

A technique to remove stop words

34 $In the attention formula, what do Q, K, and V stand for?$

A.

Quantity, Kernel, Value

B.

Query, Key, Value

C.

Question, Key, Vector

D.

Query, Kernel, Variance

35 $Which loss function is most commonly used for binary classification tasks in neural networks?$

A.

Mean Squared Error (MSE)

B.

Binary Cross-Entropy

C.

Categorical Cross-Entropy

D.

Hinge Loss

36 $What is the purpose of the Rectified Linear Unit (ReLU) activation function?$

A.

It outputs 1 for positive inputs and 0 for negative inputs

B.

It outputs the input directly if it is positive, otherwise, it outputs zero

C.

It squashes inputs between -1 and 1

D.

It converts inputs to a probability distribution

37 $Which NLP application is responsible for converting spoken language into written text?$

A.

Text-to-Speech (TTS)

B.

Automatic Speech Recognition (ASR)

C.

Machine Translation

D.

Sentiment Analysis

38 $What is the main limitation of One-Hot Encoding for words?$

A.

It is too complex to implement

B.

It results in high-dimensional sparse vectors and captures no semantic relationship between words

C.

It can only handle numbers

D.

It requires a GPU to run

39 $In a Neural Network, what is an Epoch ?$

A.

A single step of gradient descent

B.

One complete pass of the entire training dataset through the network

C.

The number of layers in the network

D.

The initial learning rate

40 $Which technique is used to prevent Overfitting in Neural Networks by randomly ignoring some neurons during training?$

A.

Pooling

B.

Dropout

C.

Padding

D.

Flattening

41 $What type of neural network is generally best suited for Time Series Forecasting ?$

A.

CNN

B.

RNN / LSTM

C.

Perceptron

D.

Decision Tree

42 $What is the definition of Transfer Learning ?$

A.

Transferring data from one hard drive to another

B.

Using a model trained on one task as a starting point for a model on a second, related task

C.

Converting a CNN to an RNN

D.

Learning without any data

43 $In NLP, what is Part-of-Speech (POS) Tagging ?$

A.

Identifying if a sentence is a question or statement

B.

Assigning a grammatical category (noun, verb, adjective, etc.) to each word in a text

C.

Translating the text

D.

removing HTML tags from text

44 $Which of the following is a key component of a Digital Assistant (like Siri or Alexa)?$

A.

Wake word detection

B.

Visual Basic scripting

C.

Manual data entry

D.

CSS styling

45 $What is the Encoder's role in a standard Encoder-Decoder Transformer architecture (like for Translation)?$

A.

To generate the final text in the target language

B.

To process the input sequence and create a contextualized representation

C.

To classify images

D.

To remove noise from audio

46 $What is Zero-shot Learning in the context of models like GPT-3?$

A.

The model requires 0 minutes to train

B.

The ability of the model to perform a task without having seen any specific examples of that task during training

C.

The model has 0 layers

D.

The model predicts 0 for all outputs

47 $What does Masked Language Modeling (MLM) involve in BERT training?$

A.

Removing all vowels from the text

B.

Hiding (masking) some percentage of the input tokens at random and predicting those masked tokens

C.

Masking the output layer

D.

Ignoring the last word of every sentence

48 $In a neural network, what are Weights ?$

A.

The input data values

B.

Learnable parameters that determine the strength of the connection between neurons

C.

The number of neurons in a layer

D.

The final output classes

49 $What is a Corpus in NLP?$

A.

A type of neural network layer

B.

A large and structured set of texts (dataset) used for training NLP models

C.

A coding error

D.

The core processing unit of a computer

50 $Which of the following is an example of an Extractive approach to Question Answering?$

A.

Generating a new paragraph explaining the answer

B.

Identifying the specific start and end indices of the answer span within a provided text context

C.

Translating the question to another language

D.

Searching a database for a keyword

Unit 4 - Practice Quiz

Send Feedback

Thank You!