Unit 4 - Practice Quiz

INT323

1 What does the acronym ETL stand for in the context of data warehousing?

A. Extract, Transfer, Load
B. Extract, Transform, Load
C. Execute, Transform, Load
D. Enter, Transact, Log

2 Which phase of the ETL process involves retrieving data from various source systems?

A. Loading
B. Transformation
C. Extraction
D. Staging

3 In the context of multi-dimensional data, what is a 'Data Cube'?

A. A relational database table
B. A three-dimensional visualization tool
C. A data structure that allows analysis of data from multiple perspectives
D. A hardware storage unit for backups

4 What is the primary purpose of the 'Transformation' step in ETL?

A. To store raw data immediately
B. To convert data into a format suitable for the destination system
C. To visualize data for end-users
D. To delete all historical data

5 Which OLAP operation involves moving from detailed data to a higher level of summarization?

A. Drill-down
B. Roll-up
C. Slice
D. Dice

6 What is the 'Staging Area' in an ETL process?

A. The final destination for the data
B. A temporary storage area used between extraction and transformation
C. The source system database
D. The user interface for reporting

7 Which of the following is a common data transformation task?

A. Data entry
B. Hardware maintenance
C. Data cleansing (removing duplicates/nulls)
D. Creating backup tapes

8 Which OLAP operation allows you to navigate from summarized data to more detailed data?

A. Roll-up
B. Drill-down
C. Pivot
D. Slice

9 What is 'Incremental Loading'?

A. Deleting the database and reloading everything
B. Loading only the data that has changed or is new since the last load
C. Loading data manually one record at a time
D. Loading data without any transformation

10 In Informatica and other ETL tools, what is a 'Mapping'?

A. A geographical map of server locations
B. A definition of the flow of data from source to target including transformations
C. A list of user passwords
D. The physical cabling diagram

11 What does CDC stand for in the context of data extraction?

A. Create Data Cubes
B. Change Data Capture
C. Central Data Core
D. Computer Disk Check

12 Which operation selects a single dimension from a data cube, effectively creating a 2D view?

A. Slicing
B. Dicing
C. Rolling
D. Drilling

13 What is a 'Full Load' strategy?

A. Loading only error logs
B. Loading data only on weekends
C. Completely erasing the target and reloading all data from the source
D. Loading data incrementally

14 The 'Dice' operation in a data cube performs which action?

A. Selects a single value for one dimension
B. Selects a sub-cube by selecting specific values for multiple dimensions
C. Rotates the axes of the cube
D. Summarizes the data

15 Which of the following is a major benefit of using an ETL tool like Informatica over hand-coding SQL?

A. It is always free of cost
B. It provides metadata management and visual workflow design
C. It requires no training
D. It eliminates the need for a database

16 In a data cube, what are 'Measures'?

A. The descriptive attributes like Product Name
B. The numerical data being analyzed (e.g., Sales Amount)
C. The keys linking tables
D. The security protocols

17 What is 'Data Validation' during the Transformation phase?

A. Encrypting the data
B. Ensuring data meets defined business rules and constraints
C. Compressing the data for storage
D. Copying data to a backup drive

18 Which transformation type is used to combine data from two different sources based on a common key?

A. Filter
B. Joiner
C. Sorter
D. Aggregator

19 What allows an ETL process to pause and restart from a specific point in case of failure?

A. Checkpoints
B. Firewalls
C. Primary Keys
D. Data Marts

20 The 'Pivot' or 'Rotate' operation is used to:

A. Delete dimensions
B. Add new data to the cube
C. Reorient the view of the data to provide an alternative presentation
D. Filter out null values

21 Which of the following is an example of a heterogeneous data source in ETL?

A. Extracting from Oracle and loading into Oracle
B. Extracting from a CSV file and a SQL database to load into a Data Warehouse
C. Copying a table within the same database
D. Renaming a table

22 What is the role of a 'Surrogate Key' generated during transformation?

A. To replace the natural primary key with a unique internal system identifier
B. To encryption the data
C. To sort the data alphabetically
D. To act as a foreign key to the source system

23 Latency in ETL refers to:

A. The time difference between data creation in the source and availability in the target
B. The cost of the software
C. The size of the hard drive
D. The number of rows in a table

24 Which ETL approach loads data into the target system first and then transforms it?

A. ETL
B. ELT
C. CDC
D. OLTP

25 In simple ETL processing, what is a 'Lookup' transformation used for?

A. To delete records
B. To search for relevant values in another table or file based on a source value
C. To aggregate sums
D. To export data to Excel

26 What is 'Data Granularity'?

A. The level of detail or summary of the data
B. The speed of the processor
C. The type of database used
D. The security level of the data

27 Which component in Informatica PowerCenter executes the sessions and workflows?

A. Repository Service
B. Integration Service
C. Client Tools
D. Designer

28 A 'Dimension' in a data cube represents:

A. The quantitative data
B. The perspective or category by which data is viewed (e.g., Time, Location)
C. The total size of the database
D. The calculation algorithm

29 What is the primary function of an 'Aggregator' transformation?

A. To filter rows
B. To perform calculations like Sum, Avg, Count on groups of data
C. To join tables
D. To split data into multiple streams

30 Why is 'Metadata' important in ETL?

A. It increases the storage size unnecessarily
B. It provides data about the data (e.g., source, format, lineage)
C. It is the actual user data
D. It slows down the extraction process

31 Which loading strategy performs an 'Update else Insert' (Upsert) operation?

A. Full Load
B. Truncate and Load
C. Incremental Merge
D. Static Load

32 The process of tracking the flow of data from origin to destination is called:

A. Data Mining
B. Data Lineage
C. Data Encryption
D. Data Compression

33 What type of transformation is 'Filtering'?

A. Active transformation
B. Passive transformation
C. Loading transformation
D. Extraction transformation

34 In a Star Schema, which tables connect to the central Fact Table?

A. Other Fact Tables
B. Dimension Tables
C. System Tables
D. Temporary Tables

35 What is the purpose of 'Error Handling' in ETL?

A. To ignore all errors
B. To stop the server permanently
C. To capture, log, and manage records that fail validation or transformation
D. To delete the source database

36 Which of the following is strictly an ETL Tool?

A. Microsoft Word
B. Informatica PowerCenter
C. Adobe Photoshop
D. Google Chrome

37 What is a 'Flat File' in the context of data extraction?

A. A database with no relationships
B. A plain text file (like CSV or TXT) containing data records
C. A corrupted file
D. A 3D image file

38 During the Loading phase, what does 'Referential Integrity' ensure?

A. That data is loaded quickly
B. That relationships between tables remain consistent (e.g., Foreign Keys exist)
C. That the disk is not full
D. That the data is encrypted

39 What is the benefit of a 'Passive' transformation?

A. It changes the number of rows
B. It requires user interaction
C. It does not change the number of rows passing through (e.g., calculation derived from a column)
D. It deletes data

40 In multidimensional data, a 'Hierarchy' defines:

A. The security levels of users
B. The parent-child relationships within a dimension (e.g., Year -> Quarter -> Month)
C. The speed of data access
D. The network topology

41 Which ETL process is most likely to be used for 'Real-time' data warehousing?

A. Batch Processing
B. Stream Processing / Micro-batching
C. Manual Entry
D. Quarterly Loading

42 What is 'Data Normalization' usually associated with?

A. OLAP Systems
B. Transactional (OLTP) Source Systems
C. Flat Files
D. Data Cubes

43 The 'Router' transformation is used to:

A. Send data to a single target
B. Split data into multiple groups based on conditions
C. Sort data
D. Join two tables

44 What is the definition of 'Data Quality'?

A. The amount of data stored
B. The fitness of data for its intended use (accuracy, completeness, consistency)
C. The cost of the database license
D. The speed of the internet connection

45 Which step usually comes immediately after Extraction in a standard ETL workflow?

A. Loading
B. Transformation (or Staging)
C. Reporting
D. Archiving

46 What is a 'Derived Column'?

A. A column extracted directly from the source
B. A new column created through calculation or concatenation during transformation
C. A deleted column
D. A primary key

47 In the context of Informatica, what is a 'Workflow'?

A. A single SQL query
B. A set of instructions describing how and when to run tasks (sessions)
C. The database installation file
D. The error log

48 Why might 'Data Type Conversion' be necessary during ETL?

A. To make the file look bigger
B. Because the source system and target system may use different data formats (e.g., String vs Date)
C. To confuse hackers
D. It is never necessary

49 Which operation is the opposite of 'Pivot'?

A. Unpivot
B. Slice
C. Filter
D. Join

50 What is the main advantage of using a Data Cube over a standard SQL query for analysis?

A. It is slower
B. It pre-aggregates data, providing faster query response for multidimensional analysis
C. It uses less memory
D. It allows for data entry