Unit 3 - Practice Quiz

CSE273 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 Which of the following data types represents categories with a meaningful order or ranking but no fixed distance between them?

A. Nominal Data
B. Ordinal Data
C. Interval Data
D. Ratio Data

2 Which Pandas function is primarily used to load data from a Comma Separated Values file into a DataFrame?

A. pd.load_csv()
B. pd.read_file()
C. pd.read_csv()
D. pd.import_csv()

3 In a box plot, the central line inside the box represents which statistical measure?

A. Mean
B. Mode
C. Median
D. Standard Deviation

4 What is the primary purpose of a histogram in Univariate analysis?

A. To show the relationship between two variables
B. To visualize the frequency distribution of a continuous variable
C. To show the count of categorical variables
D. To visualize trends over time

5 Which Pandas method provides a concise summary of a DataFrame, including the index dtype and columns, non-null values, and memory usage?

A. df.describe()
B. df.head()
C. df.info()
D. df.shape()

6 A scatter plot is most suitable for analyzing the relationship between:

A. One categorical and one numerical variable
B. Two categorical variables
C. Two continuous numerical variables
D. Time and a categorical variable

7 When analyzing the correlation between variables, a Pearson correlation coefficient () of -0.95 indicates:

A. A strong positive linear relationship
B. A weak negative linear relationship
C. A strong negative linear relationship
D. No linear relationship

8 Which visualization is best suited to show the distribution of a quantitative variable across several levels of a categorical variable, including the probability density?

A. Box plot
B. Violin plot
C. Scatter plot
D. Bar chart

9 In the context of EDA, what is Multicollinearity?

A. When a variable has a non-linear relationship with the target
B. When two or more independent variables are highly correlated with each other
C. When the data has too many missing values
D. When the target variable is categorical

10 Which metric is used to measure the asymmetry of the probability distribution of a real-valued random variable about its mean?

A. Kurtosis
B. Variance
C. Skewness
D. Standard Deviation

11 If a distribution has a long tail on the right side, it is considered:

A. Negatively skewed
B. Positively skewed
C. Symmetric
D. Normal

12 Which plot is specifically designed to visualize the count of observations in each categorical bin using bars?

A. Scatter plot
B. Count plot
C. Line plot
D. Violin plot

13 What does the Interquartile Range (IQR) represent in a box plot?

A. The range between the minimum and maximum values
B. The difference between the 75th percentile () and the 25th percentile ()
C. The variance of the data
D. The difference between the median and the mean

14 Which tool is commonly used to visualize a correlation matrix?

A. Pie Chart
B. Heatmap
C. Histogram
D. Box Plot

15 High Kurtosis in a data distribution implies:

A. The data has light tails or lack of outliers
B. The data is perfectly normal
C. The data has heavy tails or outliers
D. The data is flat

16 Which of the following represents Ratio data?

A. Temperature in Celsius
B. Likert Scale (Satisfied, Neutral, Dissatisfied)
C. Height in centimeters
D. Zip Codes

17 How do you check the first 5 rows of a Pandas DataFrame named df?

A. df.tail()
B. df.sample(5)
C. df.head()
D. df.columns

18 Which statistic helps in detecting outliers using the box plot method?

A.
B. Standard Deviation
C. Mean
D. Z-Score

19 Which type of plot is best for detecting trends over a period of time?

A. Pie Chart
B. Line Plot
C. Scatter Plot
D. Violin Plot

20 If two variables have a correlation of 0, it means:

A. They are identical
B. They have a linear relationship
C. There is no linear relationship between them
D. One causes the other

21 Which Variance Inflation Factor (VIF) value typically indicates high multicollinearity requiring attention?

A. VIF = 1
B. VIF < 5
C. VIF > 5 or 10
D. VIF = 0

22 What is the skewness of a perfectly symmetrical Normal Distribution?

A. 1
B. -1
C. 0
D. 0.5

23 In a Pandas DataFrame, what does df.describe() output?

A. Data types of columns
B. Statistical summary (count, mean, std, min, max, percentiles) of numerical columns
C. The first 5 rows
D. Correlation matrix

24 Which of the following is an example of Nominal Data?

A. Age
B. Income
C. Eye Color (Blue, Brown, Green)
D. Class Rank

25 In EDA, what is an 'anomaly'?

A. A missing value
B. A data point that deviates significantly from the rest of the data
C. The average value of the dataset
D. A categorical variable

26 Which library is built on top of Matplotlib and provides a high-level interface for drawing attractive statistical graphics, like violin plots and heatmaps?

A. NumPy
B. Pandas
C. Seaborn
D. Scikit-learn

27 When interpreting a box plot, the 'whiskers' usually extend to:

A. The minimum and maximum values (excluding outliers)
B. The standard deviation
C. The variance
D. The 10th and 90th percentiles

28 What is the relationship between Mean, Median, and Mode in a negatively skewed distribution?

A. Mean > Median > Mode
B. Mean = Median = Mode
C. Mean < Median < Mode
D. Mode < Mean < Median

29 Which pandas function is used to check for missing values in a dataset?

A. df.missing()
B. df.isnull()
C. df.check_na()
D. df.empty()

30 The visual inspection of a Scatter plot allows you to determine:

A. Only the strength of the relationship
B. Only the direction of the relationship
C. Both the strength and direction of the relationship
D. The exact equation of the line

31 A correlation matrix is a square table that shows:

A. The covariance between variables
B. The correlation coefficients between pairs of variables
C. The variance of each variable
D. The summary statistics

32 Which data type is 'Temperature in Celsius'?

A. Nominal
B. Ordinal
C. Interval
D. Ratio

33 In a Histogram, the width of the bars represents:

A. The number of observations
B. The interval (bin) size of the variable
C. The standard deviation
D. The mean value

34 What is Platykurtic distribution?

A. A distribution with negative kurtosis (flatter than normal)
B. A distribution with positive kurtosis (peaked)
C. A distribution with zero kurtosis
D. A skewed distribution

35 Which pandas method is used to count the unique values in a specific column?

A. df['col'].unique()
B. df['col'].nunique()
C. df['col'].value_counts()
D. df['col'].count()

36 When detecting patterns, 'Seasonality' refers to:

A. A long-term increase or decrease in data
B. Random fluctuations in data
C. Regular, repeating fluctuations over a specific period
D. One-time anomalies

37 Why is handling multicollinearity important for linear regression models?

A. It ensures the target variable is normally distributed
B. It stabilizes the estimates of the regression coefficients
C. It increases the number of features
D. It removes outliers

38 Which plot is essentially a box plot with a rotated kernel density plot on each side?

A. Histogram
B. Scatter plot
C. Violin plot
D. Strip plot

39 The command df.corr() in Pandas calculates which correlation coefficient by default?

A. Spearman
B. Kendall
C. Pearson
D. Point-Biserial

40 To visualize the relationship between a categorical variable and a continuous variable, which pair of plots is most appropriate?

A. Scatter plot and Line plot
B. Box plot and Violin plot
C. Heatmap and Histogram
D. Pie chart and Bar chart

41 If a dataset has NaN values, how does df.dropna() handle them?

A. It fills them with zeros
B. It fills them with the mean
C. It removes the rows (or columns) containing missing values
D. It highlights them in red

42 What is the primary difference between a Bar Chart and a Histogram?

A. Bar charts are for numerical data; Histograms for categorical
B. Histograms are for continuous numerical distributions; Bar charts are for categorical comparisons
C. There is no difference
D. Bar charts always touch each other; Histograms have gaps

43 Which of the following describes 'Discrete' quantitative data?

A. It can take any value within a range (e.g., height)
B. It can only take specific, separate values (e.g., number of students)
C. It is purely descriptive text
D. It is based on ranking

44 In a heatmap, what does the color intensity typically represent?

A. The count of null values
B. The magnitude of the value or correlation coefficient
C. The index of the row
D. The data type

45 If a variable has zero variance, what does it imply?

A. The variable is normally distributed
B. All values in the variable are the same
C. The variable has many outliers
D. The mean is zero

46 What is the first step in the EDA workflow after loading the data?

A. Training the machine learning model
B. Understanding the data structure (shape, types, head)
C. Hyperparameter tuning
D. Deploying the model

47 Which plot is useful for visualizing the pairwise relationships and distributions for multiple variables in a dataset simultaneously?

A. Pair plot (Scatter matrix)
B. Box plot
C. Pie chart
D. Area plot

48 In the context of Pandas, what is a DataFrame?

A. A 1D labeled array
B. A 2D labeled data structure with columns of potentially different types
C. A 3D array
D. A visualization tool

49 Which statistic is most robust to outliers?

A. Mean
B. Range
C. Standard Deviation
D. Median

50 Correlation does not imply:

A. Association
B. Relationship
C. Causation
D. Dependency