Unit 1 - Practice Quiz

CSE121

1 Which of the following best defines Data Science?

A. The study of computer hardware manufacturing
B. A multidisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from data
C. The process of manually entering data into spreadsheets
D. The repair and maintenance of database servers

2 Data Science is often represented as the intersection of which three primary domains?

A. Physics, Chemistry, and Biology
B. Computer Science, Math/Statistics, and Business/Domain Knowledge
C. Networking, Hardware, and Software
D. Marketing, Sales, and HR

3 Which of the following is NOT one of the original 3Vs of Big Data?

A. Volume
B. Velocity
C. Variety
D. Visualization

4 In the context of Big Data, what does Velocity refer to?

A. The accuracy of the data
B. The sheer amount of data stored
C. The speed at which data is generated, processed, and analyzed
D. The different forms of data (images, text, video)

5 Social media posts, videos, and audio files are examples of what type of data?

A. Structured Data
B. Unstructured Data
C. Relational Data
D. Clean Data

6 Which phase of the Data Science Lifecycle involves handling missing values and correcting inconsistent data?

A. Model Building
B. Data Preparation / Cleaning
C. Model Deployment
D. Problem Definition

7 What is Apache Hadoop primarily used for?

A. Creating real-time 3D video games
B. Distributed storage and processing of large datasets across clusters of computers
C. Editing high-resolution photos
D. Writing operating system kernels

8 In the Hadoop ecosystem, what is the function of HDFS?

A. Hadoop Data Filtration System
B. High-Definition File Standard
C. Hadoop Distributed File System
D. Hyper-Data Fast Storage

9 Which programming language is specifically designed for statistical computing and graphics, widely used in Data Science?

A. HTML
B. C++
C. R
D. Assembly

10 What is the primary purpose of Tableau in a Data Science workflow?

A. Operating System management
B. Data Visualization and Business Intelligence
C. Writing low-level machine code
D. Database encryption

11 Which of the following is a significant challenge of Big Data?

A. Having too little data to analyze
B. Data Security and Privacy concerns
C. The low cost of storing data
D. Lack of algorithms

12 In the Data Science Lifecycle, what happens during the Model Building phase?

A. The business problem is defined
B. The results are presented to stakeholders
C. Machine learning algorithms are applied to training data to create a predictive model
D. The data is archived for long-term storage

13 Which of the following scenarios is a common application of Data Science in E-commerce?

A. Managing warehouse physical security
B. Product Recommendation Engines (e.g., 'Customers who bought this also bought...')
C. Installing point-of-sale hardware
D. Designing the company logo

14 What does 'Veracity' refer to in the extended 5Vs of Big Data?

A. The speed of data transfer
B. The trustworthiness, quality, and accuracy of the data
C. The variety of data types
D. The economic value of data

15 Why is Cloud Computing essential for Big Data analytics?

A. It eliminates the need for internet access
B. It provides on-demand scalability and cost-effective storage/processing power
C. It forces companies to buy more physical hard drives
D. It reduces the speed of data processing

16 Which job role focuses primarily on building and maintaining the architecture (pipelines, databases) required for data generation?

A. Data Scientist
B. Data Engineer
C. Business Analyst
D. Graphic Designer

17 Which limitations does Microsoft Excel have regarding Big Data?

A. It cannot perform addition or subtraction
B. It has a row limit (approx. 1 million) and struggles with processing massive datasets efficiently
C. It requires a supercomputer to run
D. It does not support charts

18 What is MapReduce?

A. A GPS navigation system
B. A programming model for processing large data sets with a parallel, distributed algorithm
C. A method to reduce the size of a map image
D. A database query language

19 In the context of Data Science, what is Exploratory Data Analysis (EDA)?

A. Installing the database software
B. The initial investigation of data to discover patterns, spot anomalies, and check assumptions
C. The final presentation of the project
D. Writing the legal contract for data usage

20 Which skill is LEAST likely to be required for a Data Scientist?

A. Hardware circuit design
B. Statistical Analysis
C. Machine Learning
D. Data Visualization

21 Which of the following describes Predictive Analytics?

A. Describing what happened in the past
B. Using historical data to forecast future outcomes
C. Reporting current data only
D. Manually organizing paper files

22 The format of data defined as JSON (JavaScript Object Notation) is an example of:

A. Unstructured Data
B. Semi-structured Data
C. Strictly Relational Data
D. Binary Data

23 How is Big Data used in the Healthcare industry?

A. To manufacture stethoscopes
B. For disease prediction, personalized medicine, and analyzing patient records
C. To replace doctors with robots completely
D. To increase the cost of insurance manually

24 In the Data Science Lifecycle, 'Operationalize' refers to:

A. Deleting the data
B. Deploying the model into a production environment for real-world use
C. Hiring operations managers
D. Buying new computers

25 Which SQL command is most fundamental for extracting specific data from a database?

A. UPDATE
B. SELECT
C. DELETE
D. INSERT

26 What is a Data Lake?

A. A cooling system for servers
B. A centralized repository that allows you to store all your structured and unstructured data at any scale
C. A small spreadsheet
D. A visualization chart looking like water

27 Which of the following is an example of Volume in Big Data?

A. Data arriving in milliseconds
B. Data containing video, text, and XML
C. An organization processing 500 Petabytes of data
D. Data having 90% accuracy

28 What is the primary difference between a Data Analyst and a Data Scientist?

A. Data Analysts do not use computers
B. Data Scientists generally deal with more complex modeling, machine learning, and future predictions, while Analysts focus more on describing past/current trends
C. Data Analysts earn more money
D. There is no difference

29 Which 'V' represents the economic advantage a company gains from Big Data?

A. Velocity
B. Variety
C. Value
D. Volume

30 Which tool is known for its spreadsheet capabilities but also supports basic data analysis with Pivot Tables?

A. Apache Spark
B. Hadoop
C. Microsoft Excel
D. Docker

31 The mathematical equation is the basis for which common Data Science algorithm?

A. Linear Regression
B. K-Means Clustering
C. Decision Trees
D. Neural Networks

32 Which of the following is a soft skill necessary for a Data Science professional?

A. Python Programming
B. Calculus
C. Storytelling and Communication
D. Cloud Architecture

33 What is Churn Prediction in the context of business applications of Data Science?

A. Predicting how fast a butter churn moves
B. Identifying customers who are likely to stop using a service or product
C. Predicting the stock market
D. Calculating employee salaries

34 What role does IoT (Internet of Things) play in Big Data?

A. It reduces the amount of data generated
B. It acts as a massive source of real-time data generation (Velocity and Volume)
C. It is a database software
D. It is used only for printing data

35 Which library in Python is most famous for data manipulation and analysis (Dataframes)?

A. Pandas
B. PyGame
C. Django
D. Flask

36 Why is Data Visualization important?

A. It makes the report file size larger
B. It allows the human brain to process information easier and identify patterns quickly
C. It hides the actual data values
D. It converts text to binary

37 Which of the following is a risk associated with Data Bias?

A. The model becomes too fast
B. The data takes up less space
C. The AI/Model produces unfair or discriminatory results
D. The computer overheats

38 What is the 'Discovery' phase in the Data Science Lifecycle?

A. Finding a new planet
B. Acquiring resources, framing the business problem, and formulating initial hypotheses
C. Writing the final code
D. Installing software

39 A massive dataset containing log files from servers, clickstreams from a website, and sensor data is best stored in:

A. A paper notebook
B. A standard Excel file
C. A NoSQL database or Distributed File System (like HDFS)
D. A Word document

40 Which of the following best describes Business Intelligence (BI) vs Data Science?

A. BI looks backward (Descriptive); Data Science looks forward (Predictive)
B. BI uses Python; Data Science uses Calculator
C. BI is for unstructured data; Data Science is for structured data
D. They are exactly the same

41 When discussing Big Data on the Cloud, what does SaaS stand for?

A. Storage as a Service
B. Software as a Service
C. System as a Solution
D. Speed as a Service

42 Which statistical concept is used to find the 'center' of a dataset?

A. Standard Deviation
B. Mean (Average)
C. Correlation
D. Variance

43 What is the main challenge regarding Heterogeneity in Big Data?

A. All data looks the same
B. Integrating data from diverse sources with different formats and standards
C. Data is too small
D. Computers are too fast

44 Sentiment Analysis on Twitter data is an application of:

A. Image Processing
B. Natural Language Processing (NLP)
C. Audio Engineering
D. Database Administration

45 Which component of Hadoop is responsible for resource management and job scheduling?

A. HDFS
B. MapReduce
C. YARN
D. Hive

46 In the context of the 3Vs, streaming data from a jet engine during flight represents high:

A. Velocity
B. Variety
C. Volume
D. Validity

47 Which chart type is best for showing the distribution of a single numerical variable?

A. Pie Chart
B. Histogram
C. Scatter Plot
D. Network Graph

48 What is the benefit of using Open Source tools like R and Hadoop?

A. They are always easier to learn
B. They prevent collaboration
C. They are free to use and have large community support
D. They only run on Windows

49 Fraud detection in banking relies heavily on:

A. Outlier/Anomaly Detection
B. Graphic Design
C. Social Media Marketing
D. Data Compression

50 Which of the following is a step in Data Cleaning?

A. Creating the final PowerPoint
B. Imputing (filling in) missing values
C. Collecting the raw data
D. Selling the data