Unit 2 - Practice Quiz

INT344 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the fundamental representation of a word in a Vector Space Model (VSM)?

A. A vector of real numbers
B. A linked list
C. A scalar integer
D. A binary tree structure

2 Which of the following is a primary advantage of dense word vectors over one-hot encoding?

A. They capture semantic relationships
B. They are strictly binary
C. They use more memory
D. They are easier to calculate manually

3 In the context of the Continuous Bag-of-Words (CBOW) model, what is the input to the neural network?

A. A random noise vector
B. The entire document
C. The center word
D. The context words

4 How is 'cosine similarity' calculated between two word vectors, A and B?

A. Dot product of A and B divided by the product of their magnitudes
B. Sum of elements in A minus sum of elements in B
C. Euclidean distance between A and B
D. Cross product of A and B

5 If two word vectors have a cosine similarity of 1, what does this imply?

A. The words are unrelated
B. The words represent opposite meanings
C. The vectors point in exactly the same direction
D. The vectors are orthogonal

6 Which architecture predicts the surrounding words given a center word?

A. Skip-gram
B. Continuous Bag-of-Words (CBOW)
C. Principal Component Analysis
D. Latent Dirichlet Allocation

7 What mathematical technique is commonly used to visualize high-dimensional word vectors in two dimensions?

A. Logistic Regression
B. Linear Regression
C. Fourier Transform
D. Principal Component Analysis (PCA)

8 In vector arithmetic for analogies, what result is expected for vector('King') - vector('Man') + vector('Woman')?

A. Monarch
B. Princess
C. Queen
D. Prince

9 What is the role of the 'window size' hyperparameter in CBOW?

A. It sets the learning rate
B. It determines the number of epochs
C. It defines how many neighbors to consider as context
D. It determines the number of dimensions in the vector

10 Why are vector space models useful for information retrieval/document search?

A. They can match queries to documents based on semantic similarity
B. They work best with images
C. They eliminate the need for indexing
D. They allow exact keyword matching only

11 When transforming word vectors from one language to another (e.g., English to French) using a linear mapping, what are we trying to learn?

A. A rotation matrix
B. A binary classifier
C. A clustering algorithm
D. A decision tree

12 In PCA, the first principal component is the direction that maximizes what?

A. The cosine similarity
B. The error rate
C. The number of clusters
D. The variance of the data

13 Which of the following best describes the 'bag-of-words' model assumption?

A. Word order is critical for meaning
B. Word order is ignored, only frequency counts matter
C. Grammar rules are strictly enforced
D. Dependencies between words are preserved

14 In the CBOW model, how are the input context vectors usually handled before passing to the hidden layer?

A. They are concatenated
B. They are multiplied
C. Only the first word is used
D. They are averaged or summed

15 What does a cosine similarity of 0 indicate between two word vectors?

A. They are opposite
B. They are orthogonal (unrelated)
C. They are identical
D. One is a scalar multiple of the other

16 Which loss function is typically minimized when aligning two vector spaces (X and Y) via a transformation matrix R?

A. Accuracy score
B. Hinge loss
C. Cross-entropy loss
D. Frobenius norm of (XR - Y)

17 Deep Learning vector models like Word2Vec are often referred to as:

A. Hierarchical clusters
B. Count-based matrices
C. Sparse embeddings
D. Prediction-based embeddings

18 If 'Apple' and 'Pear' are close in vector space, this indicates:

A. Morphological similarity
B. Phonetic similarity
C. Semantic similarity
D. Syntactic similarity

19 When visualizing word vectors, why can't we simply plot the 300-dimensional vectors directly?

A. Human visual perception is limited to 2 or 3 dimensions
B. The vectors become binary
C. Computers cannot store 300 dimensions
D. It would take too long to render

20 Which of the following is NOT a benefit of using Vector Space Models in Machine Translation?

A. Improving alignment of synonyms
B. Mapping entire languages without parallel corpora (unsupervised)
C. Handling rare words via similarity
D. Guaranteeing grammatically perfect sentences

21 In a Word2Vec model, the dimension of the hidden layer corresponds to:

A. The number of training documents
B. The vocabulary size
C. The size of the word embedding vector
D. The window size

22 To perform document search using word vectors, how might one represent a whole document?

A. By concatenating all vectors into one giant vector
B. By using the vector of the first word only
C. By summing the ASCII values of characters
D. By taking the average (centroid) of all word vectors in the document

23 What is the 'Curse of Dimensionality' in the context of NLP?

A. Having too few dimensions to represent meaning
B. The time it takes to train a model
C. The inability to use PCA
D. Data becoming sparse and distance metrics becoming less meaningful in very high dimensions

24 Which algebraic structure is used to transform word vectors from a source language space to a target language space?

A. A vector
B. A scalar
C. A tensor of rank 3
D. A transformation matrix

25 In PCA, what are 'eigenvalues' used for?

A. To quantify the variance explained by each principal component
B. To label the axes
C. To calculate the dot product
D. To determine the direction of axes

26 Which word pair would likely have the highest Euclidean distance in a well-trained vector space?

A. Frog - Toad
B. Computer - Sandwich
C. Happy - Joyful
D. Car - Automobile

27 The output layer of a standard CBOW model typically uses which activation function to generate probabilities?

A. ReLU
B. Tanh
C. Softmax
D. Sigmoid

28 What is the main limitation of using Euclidean distance for word vectors compared to Cosine similarity?

A. It is sensitive to the magnitude (length) of the vectors
B. It cannot handle negative numbers
C. It only works in 2D
D. It is computationally harder

29 Which concept explains why 'Paris' is to 'France' as 'Tokyo' is to 'Japan' in vector space?

A. Orthogonality
B. Singular Value Decomposition
C. One-hot encoding
D. Linear substructures / Parallelism

30 How does PCA reduce dimensions?

A. By projecting data onto new axes that minimize information loss
B. By removing words with fewer than 3 letters
C. By deleting the last 50 columns of data
D. By averaging all data points to zero

31 In the context of relationships between words, 'distributional semantics' suggests that:

A. Words are defined by their spelling
B. Words that appear in similar contexts have similar meanings
C. Words are defined by their dictionary definitions
D. Words are unrelated entities

32 When training CBOW, what is the 'target'?

A. The part of speech
B. The next sentence
C. The sentiment of the sentence
D. The center word

33 What happens to the vectors of synonyms (e.g., 'huge' and 'enormous') during training?

A. They move closer together
B. One replaces the other
C. They become orthogonal
D. They move infinitely far apart

34 If you want to visualize a 1000-word subset of your vocabulary using PCA, what is the shape of the input matrix?

A. 2 x 2
B. Dimension_of_Embedding x 1000
C. 1000 x 2
D. 1000 x Dimension_of_Embedding

35 In cross-lingual information retrieval, query translation can be achieved by:

A. Re-training the model from scratch
B. Multiplying the query vector by a transformation matrix
C. Using a dictionary lookup only
D. Ignoring the language difference

36 What is a 'context window'?

A. The software used to view the code
B. The time limit for training
C. The number of words before and after a target word
D. The graphical user interface

37 Which of the following is NOT a step in performing PCA?

A. Computing eigenvectors and eigenvalues
B. Applying a Softmax function
C. Standardizing the data
D. Calculating the covariance matrix

38 If a word vector has a magnitude of 1, it is called:

A. A sparse vector
B. A complex vector
C. A binary vector
D. A normalized vector

39 Which approach is generally faster to train: CBOW or Skip-gram?

A. CBOW
B. Neither is trainable
C. Skip-gram
D. They are exactly the same

40 To capture dependencies between words that are far apart in a sentence, one should:

A. Increase the window size
B. Set window size to 0
C. Use one-hot encoding
D. Decrease the window size

41 The 'Manifold Hypothesis' in NLP suggests that:

A. Vectors must be 3D
B. High-dimensional language data lies on a lower-dimensional manifold
C. Language is flat
D. All words are equidistant

42 When performing vector arithmetic for 'Paris - France + Italy', the result is likely closest to:

A. Pizza
B. Germany
C. Rome
D. London

43 What is the dimensionality of the transformation matrix R used to map a source space of dimension D to a target space of dimension D?

A. 2D x 2D
B. D x 1
C. 1 x D
D. D x D

44 Which vector operation is primarily used to measure the relevance of a document to a search query in VSM?

A. Cosine Similarity
B. Vector Addition
C. Scalar Multiplication
D. Vector Subtraction

45 Sparse vectors (like Bag-of-Words) are characterized by:

A. Mostly non-zero values
B. Mostly zero values
C. Negative numbers only
D. Complex numbers

46 Word embeddings capture which type of relationships?

A. Only semantic
B. Neither
C. Only syntactic
D. Both syntactic and semantic

47 Before applying PCA, it is standard practice to:

A. Randomize the data
B. Square the data
C. Mean-center the data
D. Invert the data

48 In the analogy 'A is to B as C is to D', which equation represents the relationship in vector space?

A. B - A = D - C
B. B / A = D / C
C. B A = D C
D. B + A = D + C

49 Why might we use PCA on word vectors before performing clustering?

A. To translate the language
B. To increase the number of dimensions
C. To convert vectors to text
D. To remove noise and reduce computational cost

50 Which technique allows checking if the transformation matrix between two languages is accurate?

A. Checking the accuracy of translation on a hold-out dictionary
B. Calculating the determinant
C. Measuring the vector length
D. Checking if the matrix is square