1
What is the primary definition of Data Integration?
A. The process of encrypting data for security
B. The process of backing up data to the cloud
C. The process of combining data from different sources into a unified view
D. The process of deleting duplicate data from a database
Reveal Answer
Hide Answer
Correct Answer: The process of combining data from different sources into a unified view
Explanation:
Data integration involves combining data residing in different sources and providing users with a unified view of them.
2
Which of the following is a key driver/need for Data Integration in an enterprise?
A. To facilitate business intelligence and decision-making
B. To decrease the speed of the network
C. To isolate departmental data silos
D. To increase the size of the hard drives used
Reveal Answer
Hide Answer
Correct Answer: To facilitate business intelligence and decision-making
Explanation:
Data integration allows organizations to analyze combined data, leading to better business intelligence and informed decision-making.
3
In the context of data integration, what does 'Heterogeneity' refer to?
A. The speed at which data is transferred
B. Data that is exactly the same across all systems
C. The security protocols used for data
D. Differences in hardware, operating systems, and data models across sources
Reveal Answer
Hide Answer
Correct Answer: Differences in hardware, operating systems, and data models across sources
Explanation:
Heterogeneity refers to the diversity in data representation, storage, and semantics across different source systems.
4
What does ETL stand for?
A. Enter, Transfer, Load
B. Extract, Test, Lock
C. Extract, Transform, Load
D. Execute, Transmit, Log
Reveal Answer
Hide Answer
Correct Answer: Extract, Transform, Load
Explanation:
ETL stands for Extract, Transform, Load, a traditional approach to data integration.
5
In the ELT approach, where does the transformation of data primarily occur?
A. In a separate staging server before loading
B. In the middleware application
C. In the source system
D. In the target data warehouse
Reveal Answer
Hide Answer
Correct Answer: In the target data warehouse
Explanation:
In ELT (Extract, Load, Transform), data is loaded into the target system first, and the transformation happens there using the target's processing power.
6
Which integration approach provides a real-time, unified view of data without physically moving the data to a central repository?
A. Data Virtualization (Federation)
B. Data Warehousing
C. Tape Backup
D. Manual Data Entry
Reveal Answer
Hide Answer
Correct Answer: Data Virtualization (Federation)
Explanation:
Data virtualization or federation leaves data in source systems and retrieves it on-the-fly to create a virtual unified view.
7
What is 'Semantic Heterogeneity' in data integration?
A. Using different operating systems
B. Differences in file compression formats
C. Differences in the meaning or interpretation of data (e.g., synonyms, homonyms)
D. Using different network cables
Reveal Answer
Hide Answer
Correct Answer: Differences in the meaning or interpretation of data (e.g., synonyms, homonyms)
Explanation:
Semantic heterogeneity occurs when there is a disagreement about the meaning, interpretation, or intended use of the same or related data.
8
Which of the following is an advantage of a Data Warehouse?
A. It slows down analytical queries
B. It is optimized for transaction processing
C. It supports historical analysis and reporting
D. It provides a volatile, changing view of data
Reveal Answer
Hide Answer
Correct Answer: It supports historical analysis and reporting
Explanation:
A Data Warehouse is designed to store historical data to support analysis, reporting, and decision-making.
9
What creates a 'Single Version of the Truth' (SVOT)?
A. Using Excel spreadsheets
B. Avoiding data cleaning
C. Keeping data in silos
D. Effective Data Integration
Reveal Answer
Hide Answer
Correct Answer: Effective Data Integration
Explanation:
Effective data integration resolves inconsistencies across sources to provide a single, accurate reference point for data, known as the Single Version of the Truth.
10
Which technology is commonly used for Application-based Integration?
A. CD-ROMs
B. Standalone Firewalls
C. Printers
D. Enterprise Application Integration (EAI)
Reveal Answer
Hide Answer
Correct Answer: Enterprise Application Integration (EAI)
Explanation:
EAI uses middleware to enable different applications to share data and business processes.
11
What is a Data Warehouse?
A. A subject-oriented, integrated, time-variant, and non-volatile collection of data
B. A physical storage room for hard drives
C. A temporary cache for web browsers
D. A system for recording daily business transactions
Reveal Answer
Hide Answer
Correct Answer: A subject-oriented, integrated, time-variant, and non-volatile collection of data
Explanation:
This is the standard definition of a Data Warehouse provided by Bill Inmon.
12
In the context of a Data Warehouse, what does 'Non-volatile' mean?
A. Data is highly explosive
B. Data disappears when power is lost
C. Data is not changed once loaded; it is read-only for analysis
D. Data is updated in real-time by users
Reveal Answer
Hide Answer
Correct Answer: Data is not changed once loaded; it is read-only for analysis
Explanation:
Non-volatile means that once data is entered into the warehouse, it is not updated or deleted row-by-row; it is kept for historical comparison.
13
What is the primary difference between OLTP and OLAP?
A. OLTP is slower than OLAP
B. OLTP is for analysis; OLAP is for transactions
C. OLTP is for transactions; OLAP is for analysis
D. There is no difference
Reveal Answer
Hide Answer
Correct Answer: OLTP is for transactions; OLAP is for analysis
Explanation:
OLTP (Online Transaction Processing) manages day-to-day operations, while OLAP (Online Analytical Processing) is used for analysis and reporting.
14
What is a Data Mart?
A. The hardware component of a database
B. A subset of a data warehouse oriented to a specific business line or team
C. A market where data is bought and sold
D. A type of database virus
Reveal Answer
Hide Answer
Correct Answer: A subset of a data warehouse oriented to a specific business line or team
Explanation:
A Data Mart is a focused version of a data warehouse designed for a specific department (e.g., Sales, HR).
15
Which type of Data Mart draws data directly from operational sources without a central Data Warehouse?
A. Independent Data Mart
B. Integrated Data Mart
C. Dependent Data Mart
D. Hybrid Data Mart
Reveal Answer
Hide Answer
Correct Answer: Independent Data Mart
Explanation:
An Independent Data Mart is built directly from source systems without utilizing a central enterprise data warehouse.
16
What is a 'Dependent Data Mart'?
A. A mart sourced directly from flat files
B. A mart that cannot function without internet
C. A mart that relies on manual data entry
D. A mart sourced from the central Data Warehouse
Reveal Answer
Hide Answer
Correct Answer: A mart sourced from the central Data Warehouse
Explanation:
Dependent Data Marts receive their data from the Enterprise Data Warehouse, ensuring consistency across the organization.
17
What is the role of a Staging Area in data integration?
A. To hold data temporarily for processing and cleaning before loading into the warehouse
B. To visualize data for the end-user
C. To run the operating system
D. To permanently store data
Reveal Answer
Hide Answer
Correct Answer: To hold data temporarily for processing and cleaning before loading into the warehouse
Explanation:
The staging area is an intermediate storage area used for data processing (cleaning, transformation) during the ETL process.
18
Which dimension of Data Quality refers to the data correctly representing the real-world object or event?
A. Volume
B. Timeliness
C. Security
D. Accuracy
Reveal Answer
Hide Answer
Correct Answer: Accuracy
Explanation:
Accuracy measures the degree to which data correctly describes the 'real world' object or event being described.
19
Data Completeness refers to:
A. Whether the data is stored in the cloud
B. Whether the data is encrypted
C. Whether the data format is JSON
D. Whether all required data is present
Reveal Answer
Hide Answer
Correct Answer: Whether all required data is present
Explanation:
Completeness ensures that there are no missing values or records that are required for the business process.
20
Which process involves examining data to understand its structure, content, and quality?
A. Data Transmission
B. Data Encryption
C. Data Profiling
D. Data Compression
Reveal Answer
Hide Answer
Correct Answer: Data Profiling
Explanation:
Data profiling is the process of reviewing source data to understand structure, content, and quality anomalies before integration.
21
What is Data Cleansing (Scrubbing)?
A. Removing old hardware
B. The process of detecting and correcting corrupt or inaccurate records
C. Wiping the hard drive clean
D. Organizing cables in the server room
Reveal Answer
Hide Answer
Correct Answer: The process of detecting and correcting corrupt or inaccurate records
Explanation:
Data cleansing involves identifying incorrect, incomplete, or irrelevant parts of the data and replacing, modifying, or deleting the dirty data.
22
What is 'Metadata'?
A. Big data
B. Encrypted data
C. Data about data
D. Mobile data
Reveal Answer
Hide Answer
Correct Answer: Data about data
Explanation:
Metadata provides information about other data, such as its source, format, owner, and lineage.
23
Which approach to Data Warehousing is known as the 'Top-Down' approach?
A. Inmon Approach
B. Agile Approach
C. Waterfall Approach
D. Kimball Approach
Reveal Answer
Hide Answer
Correct Answer: Inmon Approach
Explanation:
Bill Inmon advocates a Top-Down approach: build the Enterprise Data Warehouse first, then build Data Marts from it.
24
Which approach to Data Warehousing is known as the 'Bottom-Up' approach?
A. Inmon Approach
B. Cloud Approach
C. Kimball Approach
D. Spiral Approach
Reveal Answer
Hide Answer
Correct Answer: Kimball Approach
Explanation:
Ralph Kimball advocates a Bottom-Up approach: start by building dimensional Data Marts that can be conformed together.
25
What is Change Data Capture (CDC)?
A. Changing the database password
B. Identifying and capturing only the data that has changed since the last extraction
C. Capturing data from the internet
D. Taking a photo of the database
Reveal Answer
Hide Answer
Correct Answer: Identifying and capturing only the data that has changed since the last extraction
Explanation:
CDC optimizes integration by only moving data that has been inserted, updated, or deleted, rather than moving the whole database every time.
26
Which of the following is NOT a benefit of high Data Quality?
A. Improved customer relations
B. Increased operational costs
C. Regulatory compliance
D. Better decision making
Reveal Answer
Hide Answer
Correct Answer: Increased operational costs
Explanation:
High data quality typically reduces operational costs (less rework, fewer errors). Increased costs are a result of poor data quality.
27
What is 'Deduplication'?
A. Creating a backup copy
B. Removing duplicate copies of repeating data
C. Doubling the storage capacity
D. Splitting data into two tables
Reveal Answer
Hide Answer
Correct Answer: Removing duplicate copies of repeating data
Explanation:
Deduplication is a data quality technique to eliminate redundant copies of data.
28
The 'Time-variant' characteristic of a Data Warehouse implies that:
A. Queries must be run within a specific time limit
B. The warehouse only operates during business hours
C. Data is stored with a time element (historical perspective)
D. The system clock is synchronized
Reveal Answer
Hide Answer
Correct Answer: Data is stored with a time element (historical perspective)
Explanation:
Time-variant means every unit of data in the warehouse is relevant to a specific moment in time, allowing historical trend analysis.
29
Which technology allows for data integration via standard XML-based messages over the web?
A. FTP
B. Direct Memory Access
C. Floppy Disks
D. Web Services (SOAP/REST)
Reveal Answer
Hide Answer
Correct Answer: Web Services (SOAP/REST)
Explanation:
Web services provide a standard means of interoperating between different software applications running on a variety of platforms using XML/JSON.
30
What is a major disadvantage of 'Manual Data Integration'?
A. Too fast for humans to track
B. Prone to human error and difficult to scale
C. High cost of software tools
D. Requires complex installation
Reveal Answer
Hide Answer
Correct Answer: Prone to human error and difficult to scale
Explanation:
Manual integration (copy-pasting or manual entry) is slow, unscalable, and very likely to introduce errors.
31
In Data Integration, what is a 'Connector' or 'Adapter'?
A. A power supply unit
B. A software component that allows the integration tool to communicate with specific data sources
C. A physical cable
D. A user who connects systems
Reveal Answer
Hide Answer
Correct Answer: A software component that allows the integration tool to communicate with specific data sources
Explanation:
Connectors/Adapters are drivers or software modules that enable connection to specific databases (e.g., Oracle, SAP, Salesforce).
32
Which of the following refers to 'Data Consistency'?
A. Data is consistently backed up
B. Data values are the same across all systems and copies
C. Data is accessed consistently every day
D. Data is stored on a consistent hardware platform
Reveal Answer
Hide Answer
Correct Answer: Data values are the same across all systems and copies
Explanation:
Consistency means that data across the enterprise is synchronized and does not contradict itself.
33
What is 'Latency' in the context of data integration?
A. The time delay between data generation and its availability for use
B. The number of users
C. The cost of the integration tool
D. The size of the data
Reveal Answer
Hide Answer
Correct Answer: The time delay between data generation and its availability for use
Explanation:
Latency refers to the lag time. Low latency implies near real-time integration; high latency implies batch processing.
34
What is 'Subject-Oriented' in the context of Data Warehousing?
A. Organized around storage media
B. Organized around applications (e.g., Payroll app)
C. Organized around major entities like Customer, Product, Sales
D. Organized around file types
Reveal Answer
Hide Answer
Correct Answer: Organized around major entities like Customer, Product, Sales
Explanation:
Subject-oriented means the data is grouped by business subjects (Customer, Product) rather than by the software application that generated it.
35
Which schema is most commonly associated with Data Marts and Warehouses?
A. Star Schema
B. Hierarchical Schema
C. Network Schema
D. XML Schema
Reveal Answer
Hide Answer
Correct Answer: Star Schema
Explanation:
The Star Schema is the simplest style of data mart schema, consisting of one or more fact tables referencing dimension tables.
36
What is a 'Fact Table' in a Data Warehouse?
A. A table of users
B. A table of system logs
C. A table containing descriptive attributes (text)
D. A table containing quantitative measurements (numbers/metrics)
Reveal Answer
Hide Answer
Correct Answer: A table containing quantitative measurements (numbers/metrics)
Explanation:
Fact tables contain the metrics, measurements, or facts of a business process (e.g., sales amount, quantity sold).
37
What is a 'Dimension Table'?
A. A table containing measurements
B. A table for metadata only
C. A table containing descriptive attributes (context for facts)
D. A table for temporary calculations
Reveal Answer
Hide Answer
Correct Answer: A table containing descriptive attributes (context for facts)
Explanation:
Dimension tables contain descriptive attributes (e.g., Customer Name, Product Category) that provide context to the facts.
38
Batch processing in data integration means:
A. Processing data in large groups at scheduled intervals
B. Processing data one record at a time instantly
C. Processing data manually
D. Processing data via email
Reveal Answer
Hide Answer
Correct Answer: Processing data in large groups at scheduled intervals
Explanation:
Batch processing collects data over a period and processes it all at once (e.g., nightly loads).
39
Which of the following is a symptom of poor Data Quality?
A. Fast query performance
B. Seamless system integration
C. Marketing mail sent to the wrong addresses
D. Reports are trusted by executives
Reveal Answer
Hide Answer
Correct Answer: Marketing mail sent to the wrong addresses
Explanation:
Sending mail to the wrong address is a classic example of inaccurate or outdated data.
40
What is 'Granularity' in a Data Warehouse?
A. The level of detail of the data
B. The cost of the storage
C. The texture of the hard drive
D. The security level
Reveal Answer
Hide Answer
Correct Answer: The level of detail of the data
Explanation:
Granularity refers to the level of detail or summary of the data (e.g., daily sales vs. monthly sales).
41
EII stands for:
A. Enterprise Information Integration
B. Early Information Input
C. Electronic Internet Interface
D. Enterprise Internal Internet
Reveal Answer
Hide Answer
Correct Answer: Enterprise Information Integration
Explanation:
EII is the ability to support a unified view of data and information across the enterprise (often via virtualization).
42
Which of the following is considered 'Unstructured Data'?
A. Emails, videos, and social media posts
B. A CSV file
C. An Excel spreadsheet with headers
D. Rows in a SQL database
Reveal Answer
Hide Answer
Correct Answer: Emails, videos, and social media posts
Explanation:
Unstructured data does not have a predefined data model or schema, unlike relational database tables.
43
Informatica PowerCenter is primarily used for:
A. Data Integration / ETL
B. Graphic Design
C. Operating System Management
D. Word Processing
Reveal Answer
Hide Answer
Correct Answer: Data Integration / ETL
Explanation:
Informatica PowerCenter is a leading enterprise ETL tool used for data integration.
44
What is the relationship between Data Governance and Data Quality?
A. Data Governance provides the policies and roles to ensure Data Quality
B. They are unrelated
C. Governance reduces Data Quality
D. Data Quality eliminates the need for Governance
Reveal Answer
Hide Answer
Correct Answer: Data Governance provides the policies and roles to ensure Data Quality
Explanation:
Governance sets the rules, responsibilities, and standards, while Data Quality is the execution and measurement of those standards.
45
Why is a Data Mart often faster to query than a Data Warehouse?
A. It is connected directly to the CPU
B. It does not use indexes
C. It uses better hardware
D. It holds less data and is optimized for specific queries
Reveal Answer
Hide Answer
Correct Answer: It holds less data and is optimized for specific queries
Explanation:
Data Marts contain a subset of data relevant to a specific domain, resulting in smaller volume and optimized structures for that domain's queries.
46
The 'Integrated' characteristic of a Data Warehouse means:
A. Data from various sources is converted to a standard format/naming convention
B. It includes email integration
C. It is integrated with the printer network
D. It is built on a single chip
Reveal Answer
Hide Answer
Correct Answer: Data from various sources is converted to a standard format/naming convention
Explanation:
Integration ensures that encoding inconsistencies (e.g., m/f vs. 0/1) are resolved so data is uniform.
47
Which comes first in the standard ETL process?
A. Extract
B. Transform
C. Load
D. Analyze
Reveal Answer
Hide Answer
Correct Answer: Extract
Explanation:
The process order is Extract (from source), Transform (clean/format), then Load (into target).
48
What is the purpose of a 'Surrogate Key' in a Data Warehouse?
A. To link to the internet
B. To unlock the server room
C. To replace the natural primary key with a unique internal system identifier
D. To encrypt the data
Reveal Answer
Hide Answer
Correct Answer: To replace the natural primary key with a unique internal system identifier
Explanation:
Surrogate keys are system-generated unique keys used in the warehouse to decouple it from changes in source system keys (natural keys).
49
What is 'Data Transformation'?
A. Deleting data
B. Moving data from A to B
C. Archiving data
D. Converting data from source format to destination format (e.g., calculation, filtering)
Reveal Answer
Hide Answer
Correct Answer: Converting data from source format to destination format (e.g., calculation, filtering)
Explanation:
Transformation involves applying business rules, cleaning, aggregating, or reformatting data.
50
Which of the following is a 'Target System' in an ETL flow?
A. The Data Warehouse where data is loaded
B. The flat file containing raw logs
C. The legacy mainframe system
D. The operational database where transactions happen
Reveal Answer
Hide Answer
Correct Answer: The Data Warehouse where data is loaded
Explanation:
In an ETL flow, the source is where data comes from, and the target (or destination) is where data is loaded, typically the Data Warehouse.