Unit 1 - Practice Quiz

CSE121 60 Questions
0 Correct 0 Wrong 60 Left
0/60

1 What is the primary goal of data science?

Data science and its need Easy
A. To design computer hardware
B. To manage company finances
C. To extract knowledge and insights from data
D. To create websites and web applications

2 In the context of Big Data, which of the '3Vs' refers to the speed at which data is generated and processed?

Big data and its 3Vs Easy
A. Velocity
B. Veracity
C. Volume
D. Variety

3 Which of the following tools is primarily used for data visualization and creating interactive dashboards?

Tools usage like Apache Hadoop, Tableau, R language, Excel Easy
A. Git
B. Apache Hadoop
C. Tableau
D. Microsoft Word

4 Which job role is primarily responsible for analyzing complex data to help a company make better business decisions?

Job roles and skillset for Data science and Big data Easy
A. Network Administrator
B. Web Developer
C. Data Scientist
D. Graphic Designer

5 What does the 'Volume' in the 3Vs of Big Data represent?

Big data and its 3Vs Easy
A. The large amount of data
B. The accuracy of the data
C. The speed of data generation
D. The different types of data

6 An e-commerce website suggesting products to you based on your previous purchases is a common application of what?

Applications of data science/Big data Easy
A. Network Security
B. Data Science
C. Database Administration
D. Software Testing

7 What is generally considered the first step in the data science lifecycle?

Data science Lifecycle with use case Easy
A. Business Understanding and Problem Definition
B. Model Deployment
C. Data Collection
D. Data Visualization

8 What is the primary purpose of Apache Hadoop?

Tools usage like Apache Hadoop, Tableau, R language, Excel Easy
A. To write and compile C++ code
B. To manage email servers
C. To create visual graphics and art
D. To store and process very large datasets across clusters of computers

9 Which of the following is a major challenge associated with Big Data?

Challenges of Big data Easy
A. Lack of available software
B. Ensuring data security and privacy
C. Computers being too fast
D. Having too little data to analyze

10 The term 'Variety' in Big Data refers to:

Big data and its 3Vs Easy
A. The number of users accessing the data
B. The financial value of the data
C. The many different types and sources of data (e.g., text, image, video)
D. The physical location where data is stored

11 Which of the following is a fundamental programming skill for a data scientist?

Skill needed for Big data Easy
A. Knowledge of a language like Python or R
B. Ability to design logos
C. Experience in hardware repair
D. Expertise in HTML and CSS

12 In the healthcare industry, what is a key use of Big Data?

Use of Big Data in different areas Easy
A. Designing hospital architecture
B. Scheduling appointments manually
C. Predicting disease outbreaks and patient outcomes
D. Manufacturing surgical tools

13 Which of the following is a programming language specifically popular for statistical computing and graphics?

Tools usage like Apache Hadoop, Tableau, R language, Excel Easy
A. R language
B. C#
C. Java
D. HTML

14 What is a primary benefit of using cloud platforms like AWS or Azure for Big Data analytics?

Big Data on the Cloud Easy
A. It is always free of charge
B. It offers scalability and pay-as-you-go pricing
C. It requires managing physical servers in-house
D. It works only with small datasets

15 What is the main responsibility of a Data Engineer?

Job roles and skillset for Data science and Big data Easy
A. Creating marketing campaigns
B. Building and maintaining the data pipelines and infrastructure
C. Providing customer support
D. Designing user interfaces for websites

16 After a data science model has been created and evaluated, what is the typical next step in the lifecycle?

Data science Lifecycle with use case Easy
A. Deployment
B. Starting a new project
C. Business Understanding
D. Deleting all the data

17 For quick and basic data entry, sorting, and creating simple charts, which desktop application is most commonly used?

Tools usage like Apache Hadoop, Tableau, R language, Excel Easy
A. SQL Server
B. Microsoft Excel
C. TensorFlow
D. Apache Spark

18 Why is data quality a significant challenge in Big Data?

Challenges of Big data Easy
A. Inaccurate or incomplete data leads to flawed insights and decisions
B. There is no way to measure data quality
C. High-quality data is too expensive to buy
D. High-quality data takes up too much storage space

19 Data science is described as an interdisciplinary field because it combines principles from:

Data science and its need Easy
A. Literature, Music, and Philosophy
B. History, Geography, and Art
C. Manufacturing, Logistics, and Human Resources
D. Statistics, Computer Science, and Domain Expertise

20 How do banks and financial institutions primarily use Big Data?

Use of Big Data in different areas Easy
A. To design the interior of their branch offices
B. For fraud detection and risk assessment
C. To organize employee social events
D. To choose the color of their logo

21 A retail company wants to use its historical sales data to forecast demand for the next quarter to optimize inventory. This scenario primarily demonstrates the need for data science to enable what kind of analytics?

Data science and its need Medium
A. Descriptive Analytics
B. Prescriptive Analytics
C. Diagnostic Analytics
D. Predictive Analytics

22 A social media platform analyzes text posts, images, and videos uploaded by its users to understand trending topics. The combination of these different data formats best illustrates which 'V' of Big Data?

Big data and its 3Vs Medium
A. Variety
B. Velocity
C. Volume
D. Veracity

23 In a project to build a customer churn prediction model, a data scientist spends significant time cleaning data, handling missing values, and creating new features like 'customer tenure'. Which phase of the data science lifecycle are they currently in?

Data science Lifecycle with use case Medium
A. Model Building
B. Data Preparation (Wrangling)
C. Model Deployment
D. Business Understanding

24 A research team needs to perform complex statistical analysis and create custom visualizations for a scientific paper. They have a moderately sized dataset (a few hundred megabytes). Which tool provides the most flexibility and power for this specific task?

Tools usage like Apache Hadoop, Tableau, R language, Excel Medium
A. Apache Hadoop
B. R language
C. Tableau
D. Microsoft Excel

25 A financial institution aggregates data from multiple sources to assess credit risk. They discover that data from one source uses a different currency format and has many spelling errors, leading to incorrect analysis. This problem is most closely related to which challenge of Big Data?

Challenges of Big data Medium
A. Data Storage
B. Data Quality and Veracity
C. Data Processing Speed
D. Data Security

26 A team is working on a big data project. One member is responsible for designing, building, and maintaining the scalable data pipelines using tools like Spark and Kafka to move data from source systems to a data lake. What is this person's most likely job role?

Job roles and skillset for Data science and Big data Medium
A. Data Engineer
B. Data Scientist
C. Data Analyst
D. Business Intelligence Developer

27 A startup is launching a new application that is expected to generate massive amounts of user data, but the initial volume is small. Why would a cloud-based Big Data solution like AWS EMR or Google Cloud Dataproc be a more strategic choice for them than building an on-premise Hadoop cluster?

Big Data on the Cloud Medium
A. It offers higher processing speeds for small data.
B. It provides better data security by default.
C. It allows for scalability and a pay-as-you-go model, reducing initial capital expenditure.
D. It eliminates the need for data scientists.

28 A city's transportation department uses real-time GPS data from buses and traffic sensors to dynamically adjust traffic light timings and reroute public transport to minimize congestion. This is a practical application of Big Data in which sector?

Use of Big Data in different areas Medium
A. Smart Cities / Urban Planning
B. Finance
C. Retail
D. Healthcare

29 An IoT-based weather monitoring system collects sensor readings (temperature, humidity, pressure) every second from thousands of distributed devices. This continuous, high-speed data generation primarily emphasizes which 'V' of Big Data?

Big data and its 3Vs Medium
A. Value
B. Volume
C. Velocity
D. Variety

30 After building a predictive model, a data scientist presents the findings to stakeholders using visualizations and a summary report, explaining how the model can help achieve a 10% reduction in operational costs. This action is a key part of which lifecycle phase?

Data science Lifecycle with use case Medium
A. Communication / Reporting
B. Model Building
C. Data Preparation
D. Data Acquisition

31 A data professional is tasked with analyzing unstructured text from customer reviews to identify common themes and sentiment. Which combination of skills is most essential for this task?

Skill needed for Big data Medium
A. A/B Testing and Experimental Design
B. SQL and Database Management
C. Natural Language Processing (NLP) and Text Mining
D. ETL Pipeline Development and Data Warehousing

32 A large corporation needs to process several petabytes of historical log data in a distributed and fault-tolerant manner. The primary goal is batch processing to generate aggregated reports. Which tool is specifically designed for this type of large-scale, distributed data processing?

Tools usage like Apache Hadoop, Tableau, R language, Excel Medium
A. Apache Hadoop (with MapReduce/Spark)
B. Tableau
C. Microsoft Excel
D. A single instance of a SQL Database

33 An e-commerce website shows a customer a personalized list of 'Products you may also like' based on their browsing history and previous purchases. This feature is a direct application of what data science technique?

Applications of data science/Big data Medium
A. Time Series Forecasting
B. Clustering Algorithms
C. Classification Algorithms
D. Recommendation Engines / Collaborative Filtering

34 Why is data science considered an interdisciplinary field rather than a single, isolated subject?

Data science and its need Medium
A. Because it only uses computer science principles.
B. Because it is only useful for businesses and not for scientific research.
C. Because it combines elements from statistics, computer science, and domain expertise.
D. Because it relies solely on creating data visualizations.

35 A company wants to implement a big data analytics platform but is concerned about complying with regulations like GDPR and CCPA, which govern how customer data is collected, stored, and used. This represents which significant challenge of Big Data?

Challenges of Big data Medium
A. Financial (Cost of Infrastructure)
B. Technological (Scalability)
C. Analytical (Finding insights)
D. Governance, Security, and Privacy

36 What is the primary advantage of using a cloud data warehouse like Google BigQuery or Amazon Redshift over a traditional on-premise data warehouse for big data analytics?

Big Data on the Cloud Medium
A. They offer weaker security features than on-premise solutions.
B. They separate storage and compute resources, allowing independent scaling.
C. They are only suitable for very small datasets.
D. They completely eliminate the need for SQL.

37 In precision agriculture, farmers use data from drones, soil sensors, and weather satellites to make decisions about irrigation, fertilization, and pest control for specific small sections of their fields. This practice demonstrates an application of Big Data to:

Use of Big Data in different areas Medium
A. Analyze financial market trends for crop prices.
B. Optimize resource usage and increase crop yield.
C. Manage the logistics of food transportation.
D. Increase marketing effectiveness for farm products.

38 A manager needs a report summarizing last quarter's sales performance, including key metrics and charts. This person is not looking for a predictive model, but a clear explanation of what happened. Who is the most appropriate professional to handle this request?

Job roles and skillset for Data science and Big data Medium
A. Data Analyst
B. Data Engineer
C. Machine Learning Engineer
D. Database Administrator

39 A data scientist needs to explain the logic behind a complex model's prediction to a non-technical audience to gain their trust. Which skill is most crucial in this situation?

Skill needed for Big data Medium
A. Distributed Computing
B. Deep Learning theory
C. Advanced Python programming
D. Data Storytelling and Visualization

40 A credit card company develops a system that analyzes transactions in real-time. If a transaction pattern deviates significantly from a user's normal spending behavior (e.g., a large purchase in a foreign country), it is flagged for review. This is a classic application of data science for:

Applications of data science/Big data Medium
A. Sentiment Analysis
B. Sales Forecasting
C. Fraud Detection
D. Customer Segmentation

41 A deployed churn prediction model for a subscription service suddenly shows a significant drop in performance (e.g., AUC from 0.85 to 0.60). The retraining pipeline, which runs weekly on new data, does not improve the score. Which of the following scenarios is the most likely root cause that would necessitate a full return to the Business Understanding phase of the data science lifecycle?

Data science Lifecycle with use case Hard
A. The company launched a new 'annual subscription' plan, fundamentally changing the definition and drivers of customer churn.
B. The customer base grew rapidly, introducing data drift where new customer behavior differs from the training data.
C. A data pipeline feeding customer support ticket information into the model features broke, leading to null values.
D. The model is overfitting to the original training data and does not generalize well to the new weekly data.

42 A high-frequency trading (HFT) firm is developing an arbitrage detection system. The system must process millions of market data ticks per second from multiple exchanges and execute trades within microseconds. While all 3Vs are present, which 'V' poses the most significant algorithmic and architectural challenge for this specific use case?

Big data and its 3Vs Hard
A. Veracity, because occasional bad ticks or data feed errors can trigger disastrously wrong trades, making data quality the paramount concern.
B. Volume, because storing petabytes of historical tick data for back-testing models is the most resource-intensive part of the HFT lifecycle.
C. Variety, because the data comes from different exchanges with slightly different formats, requiring complex data integration logic.
D. Velocity, because the core challenge is making complex decisions on streaming data under extreme low-latency constraints, which dictates the choice of in-memory processing and stream-based algorithms.

43 A healthcare provider anonymizes two separate datasets: one with patient diagnoses and zip codes, and another with mobile phone location data (geohashes) and zip codes. They plan to merge these datasets on the 'zip code' field for a public health study. What is the most profound big data challenge this action creates?

Challenges of Big data Hard
A. Re-identification risk and data privacy, as combining two anonymized datasets can create a rich, composite profile that makes it possible to de-anonymize individuals, violating privacy principles like GDPR.
B. Data integration, because matching zip codes between two large datasets from different systems can be computationally expensive and prone to formatting errors.
C. Data veracity, because location data from mobile phones can be inaccurate, leading to incorrect linkages with diagnostic data.
D. Data storage, as the merged dataset could become too large for traditional relational database systems.

44 A data science team needs to perform sentiment analysis on 10 terabytes of unstructured customer reviews stored as text files. The goal is to build a classification model and then create an interactive dashboard for the marketing team to explore sentiment trends by product and region. Which toolchain is most appropriately designed for this entire end-to-end task?

Tools usage like Apache Hadoop, Tableau, R language, Excel Hard
A. Tableau alone to connect directly to the text files, using its built-in calculation fields to perform sentiment analysis and create the dashboard.
B. Microsoft Excel with Power Query to import the text files, a VBA script for sentiment analysis, and PivotCharts for the dashboard.
C. Apache Hadoop (HDFS/MapReduce or Spark) for distributed processing of the text files, R/Python with NLP libraries for model building on a sampled or aggregated dataset, and Tableau for connecting to the aggregated results for visualization.
D. R language alone on a powerful server to read all 10TB into memory and perform the analysis and visualization using packages like shiny.

45 A company's data science team has successfully developed a highly accurate fraud detection model in a Jupyter Notebook. The business now requires this model to be integrated into their live transaction processing system, which handles thousands of requests per second with a latency requirement of <50ms. The process of taking the model from the notebook to a scalable, low-latency, production-ready API is primarily the responsibility of which role?

Job roles and skillset for Data science and Big data
A. Big Data Architect, who designs the overall data storage and processing infrastructure.
B. Data Scientist, who created the model and is responsible for its accuracy and performance.
C. Machine Learning Engineer, who specializes in model deployment, automation, scalability, and MLOps practices.
D. Data Analyst, who is responsible for interpreting the model's output and creating performance reports.

46 A financial analytics firm needs to process a 50TB dataset of historical stock data. Their primary workload consists of complex, ad-hoc analytical queries from a team of 10 analysts. The queries are unpredictable and computationally intensive. They want a cloud solution that minimizes infrastructure management and operates on a pay-per-query pricing model. Which cloud big data solution best fits this requirement?

Big Data on the Cloud Hard
A. Google BigQuery, because it's a serverless data warehouse that abstracts away infrastructure and charges based on the amount of data scanned by each query.
B. AWS Redshift, because it is a petabyte-scale, managed columnar data warehouse optimized for high-performance BI.
C. AWS EMR (Elastic MapReduce), which is a managed Hadoop service, to spin up clusters for specific jobs and then shut them down.
D. A self-managed Hadoop cluster on AWS EC2 instances, because it offers maximum control and customization over the processing environment.

47 A data scientist builds a loan default prediction model with 99% accuracy on a historically biased dataset. When deployed, the model systematically denies loans to qualified applicants from minority groups. The modeler did not use protected attributes like race directly, but the model learned proxies (e.g., zip codes). This scenario reveals a critical deficiency in which specific data science skill?

Skill needed for Big data Hard
A. Ethical judgment and bias detection, which involves proactively auditing data and models for fairness and unintended social impact.
B. Feature engineering, as the data scientist failed to create features that were uncorrelated with protected attributes.
C. Algorithm selection, as a different algorithm like a simple logistic regression might have been less biased.
D. Model evaluation, because accuracy was the wrong metric to use for an imbalanced dataset.

48 In precision agriculture, big data from IoT sensors, drones, and satellites is used to optimize crop yield. A key application is variable rate irrigation, where different parts of a field receive different amounts of water. What combination of Big Data characteristics makes this a particularly complex problem?

Use of Big Data in different areas Hard
A. High Variety and Veracity: Integrating diverse data (soil moisture, drone imagery, weather forecasts) and dealing with sensor noise/failure is the core challenge.
B. High Volume only: The sheer amount of satellite imagery is the only significant big data challenge to overcome.
C. Low Volume and high Veracity: The data is small and clean, making it a simple analytical problem.
D. High Velocity only: The speed of data from real-time soil sensors is the most critical factor.

49 A retail company's executive team asks their new data scientist to "Use AI to increase our profits." Why is this initial request a poor starting point for a data science project, and what does it demonstrate a need for?

Data science and its need Hard
A. The request assumes AI is the solution. It demonstrates the need for the data scientist to have advanced machine learning skills to build a complex profit-optimization algorithm.
B. The request is too vague and lacks a specific, measurable business problem. It demonstrates the need for the data scientist to apply problem formulation and business acumen skills to translate a general goal into a concrete, solvable data science problem (e.g., 'reduce customer churn by 5%').
C. The request focuses on profit instead of customer satisfaction. It demonstrates the need for the company to have a stronger ethical framework.
D. The request is not technically feasible. It demonstrates the need for better data infrastructure before any AI projects can be started.

50 A genomics research institute processes full human genomes. Each genome is ~100GB (high Volume). They are integrating this with unstructured clinical notes and patient-reported outcomes from a mobile app. A fourth 'V', Veracity, is often added to the 3Vs. In this specific context, what is the most critical implication of low Veracity?

Big data and its 3Vs Hard
A. Processing delays: The sheer speed of data from sequencers (Velocity) is the primary bottleneck, not the data's accuracy.
B. Integration challenges: The diverse data formats (Variety) are much harder to handle than potential inaccuracies within the data.
C. Storage costs: The volume of genomic data is the only significant financial and technical hurdle.
D. False discoveries: Inaccurate gene sequencing or misinterpretation of clinical notes could lead to incorrect correlations between genes and diseases, invalidating research findings and potentially harming patients.

51 An analyst needs to investigate a potential data quality issue in a 2-billion-row dataset stored in a Hadoop cluster. They need to perform a series of complex aggregations and checks (e.g., find the count of nulls per column, calculate distributions, check for outliers). Writing a full Spark job in Python/Scala is too slow for this interactive, exploratory task. Which tool or approach would be most efficient for this specific scenario?

Tools usage like Apache Hadoop, Tableau, R language, Excel Hard
A. Writing a custom MapReduce job in Java to calculate the required statistics.
B. Using Tableau to connect directly to the Hadoop cluster and build a dashboard to find the anomalies.
C. Using an interactive SQL query engine like Apache Hive LLAP or Presto/Trino that sits on top of the Hadoop data lake.
D. Exporting a 1% sample of the data into a CSV file and analyzing it with Microsoft Excel.

52 During the 'Model Deployment' phase of a data science project, the team discovers that the model's predictions, while accurate in offline tests, have a high variance in latency (from 50ms to 2000ms). This violates the service-level agreement (SLA) for the production application. This issue forces the team to revisit which earlier lifecycle phase most intensively?

Data science Lifecycle with use case Hard
A. Model Evaluation, to choose a different accuracy metric that accounts for latency.
B. Data Collection, to acquire data that is faster to process.
C. Feature Engineering, as the latency variance is likely caused by complex features that are computationally expensive to generate in real-time.
D. Business Understanding, to renegotiate the SLA with the stakeholders.

53 A city is implementing a predictive policing system, which uses historical crime data to predict locations where crime is likely to occur. From an ethical and societal perspective, what is the most significant risk of deploying such a system, even if it is statistically accurate on historical data?

Applications of data science/Big data Hard
A. Computational Cost: Processing years of crime data and real-time inputs would require a significant investment in big data infrastructure.
B. Feedback Loop Amplification: The model may create a self-fulfilling prophecy where police are sent to predicted hotspots, leading to more arrests in those areas, which then generates more data to confirm the original prediction, amplifying existing biases.
C. Lack of Model Interpretability: Using a complex model like a deep neural network would make it impossible to explain to the public why a certain area was targeted.
D. Data Security Risks: The historical crime data could be hacked, revealing sensitive information about past incidents and victims.

54 A global e-commerce company wants to create a unified customer view by combining data from its regional databases in the European Union, the United States, and China. Beyond the technical data integration challenges, what is the most significant 'soft' challenge they will face?

Challenges of Big data Hard
A. Data Silos: The different regional IT teams may be unwilling to share their data and control with a central authority.
B. Network Latency: Moving large amounts of data between continents will be slow and expensive.
C. Language and Character Encoding: The data will be in different languages and character sets (e.g., UTF-8, GB2312), requiring complex text processing.
D. Data Sovereignty and Regulatory Compliance: Each region has different data privacy laws (e.g., GDPR in the EU, PIPL in China) that restrict how data can be transferred, stored, and processed across borders, making a unified view legally complex.

55 A data scientist in R is working with a data.frame named sales_df with 10 million rows. They need to calculate the mean price for each category. They run the following two code snippets. Why is the data.table approach significantly faster than the tapply approach?

Code 1: tapply(sales_dfcategory, mean)

Code 2: library(data.table); setDT(sales_df); sales_df[, mean(price), by = category]

R language Hard
A. The tapply function is not designed for numeric data and performs slow type conversions internally, whereas data.table is specifically for numbers.
B. The setDT() function creates a physical copy of the data in a more efficient columnar format, which allows for faster access.
C. The data.table approach pre-compiles the aggregation logic into bytecode, while tapply is an interpreted function call, which is always slower.
D. The data.table package is written in C and is highly optimized for performance. It groups the data by reference using a radix sort on the grouping columns, avoiding the data copying and looping overhead inherent in base R functions like tapply.

56 A company is deciding between a cloud data lake (e.g., storing files on AWS S3 and using Athena/Spark for queries) and a cloud data warehouse (e.g., Snowflake, Redshift, BigQuery). The company's data is a mix of structured transactional data and highly unstructured data like images and audio files. They prioritize schema flexibility and low-cost storage for raw data. Which architecture should they choose and why?

Big Data on the Cloud Hard
A. Data Warehouse, because it provides a structured schema ('schema-on-write') which enforces data quality and delivers higher query performance for structured data.
B. A hybrid approach, using the data warehouse for structured data and the data lake for unstructured data, but this is architecturally impossible on major cloud platforms.
C. Data Lake, because it stores data in its native format ('schema-on-read') and decouples storage from compute, offering maximum flexibility for diverse data types and lower storage costs.
D. Neither, they should use a traditional on-premise relational database like Oracle, which can handle both structured and unstructured data using LOB types.

57 A team consists of a Data Analyst, a Data Scientist, and a Data Engineer. They are tasked with a project to analyze customer behavior. Which of the following correctly delineates the primary focus of each role in the initial phases of this project?

Job roles and skillset for Data science and Big data Hard
A. Data Scientist: Designs and builds the data extraction pipelines. Data Engineer: Performs advanced statistical modeling. Data Analyst: Communicates the final model results to stakeholders.
B. Data Analyst: Is responsible for all the data cleaning. Data Engineer: Is responsible for all the feature engineering. Data Scientist: Is responsible only for choosing the final algorithm.
C. Data Engineer: Creates the final business-facing dashboards. Data Analyst: Deploys the machine learning models. Data Scientist: Is responsible for the cloud infrastructure budget.
D. Data Engineer: Builds robust, automated pipelines to extract and transport data. Data Scientist: Explores the raw data to formulate hypotheses and plan models. Data Analyst: Queries the processed data to create initial descriptive reports and dashboards.

58 When considering Apache Hadoop's core components, what is the fundamental architectural reason that YARN (Yet Another Resource Negotiator) was introduced to replace the resource management logic of the original MapReduce (MRv1)?

Tools usage like Apache Hadoop, Tableau, R language, Excel Hard
A. To enable Hadoop to run on cloud platforms like AWS and Azure, as MRv1 was designed only for on-premise hardware.
B. To decouple resource management from the data processing framework, allowing different frameworks (like Spark, Flink, etc.), not just MapReduce, to run on the same Hadoop cluster.
C. To provide a better graphical user interface for monitoring Hadoop jobs, which was lacking in MRv1.
D. To improve the speed of the 'shuffle and sort' phase within MapReduce jobs by using a more efficient negotiation algorithm.

59 A data scientist presents a complex deep learning model to business stakeholders. The model is highly accurate, but the presentation is filled with technical jargon like 'ReLU activation functions,' 'dropout rates,' and 'backpropagation.' The stakeholders are confused and lose confidence in the project. This highlights a critical failure in which non-technical skill?

Skill needed for Big data Hard
A. Domain Knowledge: The data scientist clearly did not understand the business domain well enough to build a useful model.
B. Scientific Method: The data scientist failed to form a proper hypothesis before building the model.
C. Storytelling and Communication: The ability to translate complex technical concepts and model results into a clear, concise narrative that connects to business impact and is understandable to a non-technical audience.
D. Data Visualization: The presentation probably lacked sufficient charts and graphs to explain the model's performance.

60 In the telecommunications industry, Big Data is used to analyze Call Detail Records (CDRs) for network optimization and churn prediction. A single CDR contains metadata like call duration, start/end time, and tower location, but not the call content. Why is analyzing the graph of connections (who calls whom) often more powerful for churn prediction than analyzing an individual's call statistics in isolation?

Use of Big Data in different areas Hard
A. Because a customer's churn is heavily influenced by the churn of their social circle. If a person's most frequently called contacts start leaving the network (a property of the graph structure), that person is also highly likely to churn.
B. Because visualizing the call graph is the only way to understand network traffic patterns.
C. Because analyzing call duration and frequency for a single user (isolated statistics) provides no predictive power for churn.
D. Because graph databases like Neo4j are faster at processing CDR data than traditional relational databases.