Unit 2 - Practice Quiz

INT344 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the fundamental representation of a word in a Vector Space Model (VSM)?

A. A linked list
B. A binary tree structure
C. A vector of real numbers
D. A scalar integer

2 Which of the following is a primary advantage of dense word vectors over one-hot encoding?

A. They capture semantic relationships
B. They are strictly binary
C. They are easier to calculate manually
D. They use more memory

3 In the context of the Continuous Bag-of-Words (CBOW) model, what is the input to the neural network?

A. A random noise vector
B. The center word
C. The context words
D. The entire document

4 How is 'cosine similarity' calculated between two word vectors, A and B?

A. Cross product of A and B
B. Euclidean distance between A and B
C. Dot product of A and B divided by the product of their magnitudes
D. Sum of elements in A minus sum of elements in B

5 If two word vectors have a cosine similarity of 1, what does this imply?

A. The words are unrelated
B. The vectors are orthogonal
C. The vectors point in exactly the same direction
D. The words represent opposite meanings

6 Which architecture predicts the surrounding words given a center word?

A. Continuous Bag-of-Words (CBOW)
B. Principal Component Analysis
C. Latent Dirichlet Allocation
D. Skip-gram

7 What mathematical technique is commonly used to visualize high-dimensional word vectors in two dimensions?

A. Linear Regression
B. Logistic Regression
C. Fourier Transform
D. Principal Component Analysis (PCA)

8 In vector arithmetic for analogies, what result is expected for vector('King') - vector('Man') + vector('Woman')?

A. Monarch
B. Princess
C. Prince
D. Queen

9 What is the role of the 'window size' hyperparameter in CBOW?

A. It sets the learning rate
B. It determines the number of dimensions in the vector
C. It determines the number of epochs
D. It defines how many neighbors to consider as context

10 Why are vector space models useful for information retrieval/document search?

A. They eliminate the need for indexing
B. They can match queries to documents based on semantic similarity
C. They allow exact keyword matching only
D. They work best with images

11 When transforming word vectors from one language to another (e.g., English to French) using a linear mapping, what are we trying to learn?

A. A decision tree
B. A clustering algorithm
C. A rotation matrix
D. A binary classifier

12 In PCA, the first principal component is the direction that maximizes what?

A. The number of clusters
B. The error rate
C. The cosine similarity
D. The variance of the data

13 Which of the following best describes the 'bag-of-words' model assumption?

A. Word order is critical for meaning
B. Word order is ignored, only frequency counts matter
C. Grammar rules are strictly enforced
D. Dependencies between words are preserved

14 In the CBOW model, how are the input context vectors usually handled before passing to the hidden layer?

A. Only the first word is used
B. They are averaged or summed
C. They are concatenated
D. They are multiplied

15 What does a cosine similarity of 0 indicate between two word vectors?

A. They are opposite
B. They are identical
C. They are orthogonal (unrelated)
D. One is a scalar multiple of the other

16 Which loss function is typically minimized when aligning two vector spaces (X and Y) via a transformation matrix R?

A. Hinge loss
B. Frobenius norm of (XR - Y)
C. Cross-entropy loss
D. Accuracy score

17 Deep Learning vector models like Word2Vec are often referred to as:

A. Hierarchical clusters
B. Prediction-based embeddings
C. Sparse embeddings
D. Count-based matrices

18 If 'Apple' and 'Pear' are close in vector space, this indicates:

A. Morphological similarity
B. Phonetic similarity
C. Syntactic similarity
D. Semantic similarity

19 When visualizing word vectors, why can't we simply plot the 300-dimensional vectors directly?

A. Human visual perception is limited to 2 or 3 dimensions
B. The vectors become binary
C. It would take too long to render
D. Computers cannot store 300 dimensions

20 Which of the following is NOT a benefit of using Vector Space Models in Machine Translation?

A. Guaranteeing grammatically perfect sentences
B. Improving alignment of synonyms
C. Handling rare words via similarity
D. Mapping entire languages without parallel corpora (unsupervised)

21 In a Word2Vec model, the dimension of the hidden layer corresponds to:

A. The number of training documents
B. The size of the word embedding vector
C. The window size
D. The vocabulary size

22 To perform document search using word vectors, how might one represent a whole document?

A. By using the vector of the first word only
B. By taking the average (centroid) of all word vectors in the document
C. By concatenating all vectors into one giant vector
D. By summing the ASCII values of characters

23 What is the 'Curse of Dimensionality' in the context of NLP?

A. The time it takes to train a model
B. The inability to use PCA
C. Having too few dimensions to represent meaning
D. Data becoming sparse and distance metrics becoming less meaningful in very high dimensions

24 Which algebraic structure is used to transform word vectors from a source language space to a target language space?

A. A transformation matrix
B. A scalar
C. A vector
D. A tensor of rank 3

25 In PCA, what are 'eigenvalues' used for?

A. To label the axes
B. To determine the direction of axes
C. To quantify the variance explained by each principal component
D. To calculate the dot product

26 Which word pair would likely have the highest Euclidean distance in a well-trained vector space?

A. Car - Automobile
B. Happy - Joyful
C. Frog - Toad
D. Computer - Sandwich

27 The output layer of a standard CBOW model typically uses which activation function to generate probabilities?

A. Softmax
B. Tanh
C. Sigmoid
D. ReLU

28 What is the main limitation of using Euclidean distance for word vectors compared to Cosine similarity?

A. It cannot handle negative numbers
B. It is computationally harder
C. It is sensitive to the magnitude (length) of the vectors
D. It only works in 2D

29 Which concept explains why 'Paris' is to 'France' as 'Tokyo' is to 'Japan' in vector space?

A. Orthogonality
B. Linear substructures / Parallelism
C. Singular Value Decomposition
D. One-hot encoding

30 How does PCA reduce dimensions?

A. By averaging all data points to zero
B. By deleting the last 50 columns of data
C. By projecting data onto new axes that minimize information loss
D. By removing words with fewer than 3 letters

31 In the context of relationships between words, 'distributional semantics' suggests that:

A. Words are defined by their spelling
B. Words are unrelated entities
C. Words that appear in similar contexts have similar meanings
D. Words are defined by their dictionary definitions

32 When training CBOW, what is the 'target'?

A. The next sentence
B. The sentiment of the sentence
C. The center word
D. The part of speech

33 What happens to the vectors of synonyms (e.g., 'huge' and 'enormous') during training?

A. They become orthogonal
B. They move closer together
C. They move infinitely far apart
D. One replaces the other

34 If you want to visualize a 1000-word subset of your vocabulary using PCA, what is the shape of the input matrix?

A. 1000 x 2
B. 2 x 2
C. 1000 x Dimension_of_Embedding
D. Dimension_of_Embedding x 1000

35 In cross-lingual information retrieval, query translation can be achieved by:

A. Re-training the model from scratch
B. Using a dictionary lookup only
C. Ignoring the language difference
D. Multiplying the query vector by a transformation matrix

36 What is a 'context window'?

A. The software used to view the code
B. The number of words before and after a target word
C. The graphical user interface
D. The time limit for training

37 Which of the following is NOT a step in performing PCA?

A. Standardizing the data
B. Applying a Softmax function
C. Calculating the covariance matrix
D. Computing eigenvectors and eigenvalues

38 If a word vector has a magnitude of 1, it is called:

A. A complex vector
B. A sparse vector
C. A binary vector
D. A normalized vector

39 Which approach is generally faster to train: CBOW or Skip-gram?

A. Skip-gram
B. CBOW
C. Neither is trainable
D. They are exactly the same

40 To capture dependencies between words that are far apart in a sentence, one should:

A. Use one-hot encoding
B. Increase the window size
C. Decrease the window size
D. Set window size to 0

41 The 'Manifold Hypothesis' in NLP suggests that:

A. High-dimensional language data lies on a lower-dimensional manifold
B. All words are equidistant
C. Vectors must be 3D
D. Language is flat

42 When performing vector arithmetic for 'Paris - France + Italy', the result is likely closest to:

A. Germany
B. Rome
C. Pizza
D. London

43 What is the dimensionality of the transformation matrix R used to map a source space of dimension D to a target space of dimension D?

A. D x 1
B. 2D x 2D
C. 1 x D
D. D x D

44 Which vector operation is primarily used to measure the relevance of a document to a search query in VSM?

A. Scalar Multiplication
B. Vector Subtraction
C. Cosine Similarity
D. Vector Addition

45 Sparse vectors (like Bag-of-Words) are characterized by:

A. Mostly zero values
B. Negative numbers only
C. Mostly non-zero values
D. Complex numbers

46 Word embeddings capture which type of relationships?

A. Neither
B. Only syntactic
C. Only semantic
D. Both syntactic and semantic

47 Before applying PCA, it is standard practice to:

A. Square the data
B. Invert the data
C. Mean-center the data
D. Randomize the data

48 In the analogy 'A is to B as C is to D', which equation represents the relationship in vector space?

A. B / A = D / C
B. B - A = D - C
C. B A = D C
D. B + A = D + C

49 Why might we use PCA on word vectors before performing clustering?

A. To convert vectors to text
B. To remove noise and reduce computational cost
C. To increase the number of dimensions
D. To translate the language

50 Which technique allows checking if the transformation matrix between two languages is accurate?

A. Calculating the determinant
B. Checking the accuracy of translation on a hold-out dictionary
C. Measuring the vector length
D. Checking if the matrix is square