Unit 2 - Practice Quiz

INT323

1 What is the primary definition of Data Integration?

A. The process of deleting duplicate data from a database
B. The process of combining data from different sources into a unified view
C. The process of backing up data to the cloud
D. The process of encrypting data for security

2 Which of the following is a key driver/need for Data Integration in an enterprise?

A. To increase the size of the hard drives used
B. To isolate departmental data silos
C. To facilitate business intelligence and decision-making
D. To decrease the speed of the network

3 In the context of data integration, what does 'Heterogeneity' refer to?

A. Data that is exactly the same across all systems
B. Differences in hardware, operating systems, and data models across sources
C. The speed at which data is transferred
D. The security protocols used for data

4 What does ETL stand for?

A. Extract, Transform, Load
B. Enter, Transfer, Load
C. Execute, Transmit, Log
D. Extract, Test, Lock

5 In the ELT approach, where does the transformation of data primarily occur?

A. In the source system
B. In a separate staging server before loading
C. In the target data warehouse
D. In the middleware application

6 Which integration approach provides a real-time, unified view of data without physically moving the data to a central repository?

A. Data Warehousing
B. Data Virtualization (Federation)
C. Manual Data Entry
D. Tape Backup

7 What is 'Semantic Heterogeneity' in data integration?

A. Using different operating systems
B. Using different network cables
C. Differences in the meaning or interpretation of data (e.g., synonyms, homonyms)
D. Differences in file compression formats

8 Which of the following is an advantage of a Data Warehouse?

A. It is optimized for transaction processing
B. It provides a volatile, changing view of data
C. It supports historical analysis and reporting
D. It slows down analytical queries

9 What creates a 'Single Version of the Truth' (SVOT)?

A. Keeping data in silos
B. Effective Data Integration
C. Using Excel spreadsheets
D. Avoiding data cleaning

10 Which technology is commonly used for Application-based Integration?

A. Enterprise Application Integration (EAI)
B. CD-ROMs
C. Printers
D. Standalone Firewalls

11 What is a Data Warehouse?

A. A system for recording daily business transactions
B. A subject-oriented, integrated, time-variant, and non-volatile collection of data
C. A temporary cache for web browsers
D. A physical storage room for hard drives

12 In the context of a Data Warehouse, what does 'Non-volatile' mean?

A. Data disappears when power is lost
B. Data is updated in real-time by users
C. Data is not changed once loaded; it is read-only for analysis
D. Data is highly explosive

13 What is the primary difference between OLTP and OLAP?

A. OLTP is for analysis; OLAP is for transactions
B. OLTP is for transactions; OLAP is for analysis
C. OLTP is slower than OLAP
D. There is no difference

14 What is a Data Mart?

A. A market where data is bought and sold
B. A subset of a data warehouse oriented to a specific business line or team
C. A type of database virus
D. The hardware component of a database

15 Which type of Data Mart draws data directly from operational sources without a central Data Warehouse?

A. Dependent Data Mart
B. Independent Data Mart
C. Hybrid Data Mart
D. Integrated Data Mart

16 What is a 'Dependent Data Mart'?

A. A mart sourced from the central Data Warehouse
B. A mart that relies on manual data entry
C. A mart that cannot function without internet
D. A mart sourced directly from flat files

17 What is the role of a Staging Area in data integration?

A. To permanently store data
B. To visualize data for the end-user
C. To hold data temporarily for processing and cleaning before loading into the warehouse
D. To run the operating system

18 Which dimension of Data Quality refers to the data correctly representing the real-world object or event?

A. Timeliness
B. Accuracy
C. Volume
D. Security

19 Data Completeness refers to:

A. Whether all required data is present
B. Whether the data is encrypted
C. Whether the data is stored in the cloud
D. Whether the data format is JSON

20 Which process involves examining data to understand its structure, content, and quality?

A. Data Profiling
B. Data Encryption
C. Data Compression
D. Data Transmission

21 What is Data Cleansing (Scrubbing)?

A. Wiping the hard drive clean
B. The process of detecting and correcting corrupt or inaccurate records
C. Removing old hardware
D. Organizing cables in the server room

22 What is 'Metadata'?

A. Data about data
B. Big data
C. Mobile data
D. Encrypted data

23 Which approach to Data Warehousing is known as the 'Top-Down' approach?

A. Kimball Approach
B. Inmon Approach
C. Agile Approach
D. Waterfall Approach

24 Which approach to Data Warehousing is known as the 'Bottom-Up' approach?

A. Kimball Approach
B. Inmon Approach
C. Spiral Approach
D. Cloud Approach

25 What is Change Data Capture (CDC)?

A. Taking a photo of the database
B. Identifying and capturing only the data that has changed since the last extraction
C. Changing the database password
D. Capturing data from the internet

26 Which of the following is NOT a benefit of high Data Quality?

A. Improved customer relations
B. Increased operational costs
C. Better decision making
D. Regulatory compliance

27 What is 'Deduplication'?

A. Creating a backup copy
B. Removing duplicate copies of repeating data
C. Doubling the storage capacity
D. Splitting data into two tables

28 The 'Time-variant' characteristic of a Data Warehouse implies that:

A. The warehouse only operates during business hours
B. Data is stored with a time element (historical perspective)
C. Queries must be run within a specific time limit
D. The system clock is synchronized

29 Which technology allows for data integration via standard XML-based messages over the web?

A. Web Services (SOAP/REST)
B. FTP
C. Floppy Disks
D. Direct Memory Access

30 What is a major disadvantage of 'Manual Data Integration'?

A. High cost of software tools
B. Prone to human error and difficult to scale
C. Requires complex installation
D. Too fast for humans to track

31 In Data Integration, what is a 'Connector' or 'Adapter'?

A. A physical cable
B. A software component that allows the integration tool to communicate with specific data sources
C. A user who connects systems
D. A power supply unit

32 Which of the following refers to 'Data Consistency'?

A. Data is stored on a consistent hardware platform
B. Data values are the same across all systems and copies
C. Data is consistently backed up
D. Data is accessed consistently every day

33 What is 'Latency' in the context of data integration?

A. The size of the data
B. The time delay between data generation and its availability for use
C. The cost of the integration tool
D. The number of users

34 What is 'Subject-Oriented' in the context of Data Warehousing?

A. Organized around applications (e.g., Payroll app)
B. Organized around major entities like Customer, Product, Sales
C. Organized around file types
D. Organized around storage media

35 Which schema is most commonly associated with Data Marts and Warehouses?

A. Star Schema
B. XML Schema
C. Network Schema
D. Hierarchical Schema

36 What is a 'Fact Table' in a Data Warehouse?

A. A table containing descriptive attributes (text)
B. A table containing quantitative measurements (numbers/metrics)
C. A table of users
D. A table of system logs

37 What is a 'Dimension Table'?

A. A table containing measurements
B. A table containing descriptive attributes (context for facts)
C. A table for temporary calculations
D. A table for metadata only

38 Batch processing in data integration means:

A. Processing data one record at a time instantly
B. Processing data in large groups at scheduled intervals
C. Processing data manually
D. Processing data via email

39 Which of the following is a symptom of poor Data Quality?

A. Reports are trusted by executives
B. Marketing mail sent to the wrong addresses
C. Fast query performance
D. Seamless system integration

40 What is 'Granularity' in a Data Warehouse?

A. The level of detail of the data
B. The texture of the hard drive
C. The cost of the storage
D. The security level

41 EII stands for:

A. Enterprise Information Integration
B. Electronic Internet Interface
C. Early Information Input
D. Enterprise Internal Internet

42 Which of the following is considered 'Unstructured Data'?

A. Rows in a SQL database
B. An Excel spreadsheet with headers
C. Emails, videos, and social media posts
D. A CSV file

43 Informatica PowerCenter is primarily used for:

A. Graphic Design
B. Data Integration / ETL
C. Operating System Management
D. Word Processing

44 What is the relationship between Data Governance and Data Quality?

A. They are unrelated
B. Data Governance provides the policies and roles to ensure Data Quality
C. Data Quality eliminates the need for Governance
D. Governance reduces Data Quality

45 Why is a Data Mart often faster to query than a Data Warehouse?

A. It uses better hardware
B. It holds less data and is optimized for specific queries
C. It does not use indexes
D. It is connected directly to the CPU

46 The 'Integrated' characteristic of a Data Warehouse means:

A. It is built on a single chip
B. Data from various sources is converted to a standard format/naming convention
C. It includes email integration
D. It is integrated with the printer network

47 Which comes first in the standard ETL process?

A. Load
B. Transform
C. Extract
D. Analyze

48 What is the purpose of a 'Surrogate Key' in a Data Warehouse?

A. To encrypt the data
B. To replace the natural primary key with a unique internal system identifier
C. To unlock the server room
D. To link to the internet

49 What is 'Data Transformation'?

A. Moving data from A to B
B. Converting data from source format to destination format (e.g., calculation, filtering)
C. Deleting data
D. Archiving data

50 Which of the following is a 'Target System' in an ETL flow?

A. The operational database where transactions happen
B. The Data Warehouse where data is loaded
C. The flat file containing raw logs
D. The legacy mainframe system