Unit 5 - Practice Quiz

INT323

1 What is the primary file extension for an SSIS package?

A. .ssis
B. .dtsx
C. .sln
D. .dtproj

2 Which component in the SSIS architecture is responsible for defining the logical flow of tasks such as sending emails or executing SQL scripts?

A. Data Flow
B. Control Flow
C. Event Handlers
D. Connection Managers

3 In SSIS, which IDE is typically used for developing Integration Services projects?

A. SQL Server Management Studio (SSMS)
B. Visual Studio with SSDT
C. Eclipse
D. RapidMiner Studio

4 What is the primary function of a 'Connection Manager' in SSIS?

A. To transform data types
B. To establish and maintain links to data sources and destinations
C. To schedule the package execution
D. To debug the data flow

5 Which SSIS task is specifically designed to move data from a source to a destination while allowing for transformations?

A. Execute SQL Task
B. Data Flow Task
C. Script Task
D. File System Task

6 In an SSIS Data Flow, what does a 'Precedence Constraint' define?

A. The data type of a column
B. The condition under which the next task in the Control Flow is executed
C. The speed of data transfer
D. The error handling mechanism inside a transformation

7 Which SSIS transformation is used to combine data from two sorted inputs based on a matching column?

A. Union All
B. Merge Join
C. Multicast
D. Derived Column

8 What is the purpose of the 'Conditional Split' transformation in SSIS?

A. To combine multiple data streams into one
B. To route data rows to different outputs based on expressions
C. To remove duplicate rows
D. To convert data types

9 If you need to perform a lookup against a reference table to retrieve related columns, which SSIS transformation should you use?

A. Lookup Transformation
B. Fuzzy Grouping
C. Sort Transformation
D. Row Count

10 Which SSIS transformation creates new column values by applying expressions to existing columns?

A. Data Conversion
B. Derived Column
C. Aggregate
D. Multicast

11 In SSIS, what is required before using a 'Merge' or 'Merge Join' transformation?

A. The data must be sorted
B. The data must be normalized
C. The data must be in XML format
D. The data must be aggregated

12 What is the purpose of the 'Multicast' transformation in SSIS?

A. To filter data based on criteria
B. To send identical copies of the data to multiple output paths
C. To merge multiple inputs into one
D. To perform an inner join

13 Which transformation would you use to change the data type of a column (e.g., from String to Integer) in SSIS?

A. Data Conversion
B. Copy Column
C. Character Map
D. Percentage Sampling

14 In the SSIS ecosystem, where are project-level parameters and connection managers usually managed?

A. In the Windows Registry
B. In the Project.params file and Solution Explorer
C. In the destination database
D. In a text file on the desktop

15 What is the function of the 'Aggregate' transformation in SSIS?

A. To sort data alphabetically
B. To perform calculations like Sum, Average, or Count on grouped data
C. To split data into training and testing sets
D. To encrypt sensitive data

16 RapidMiner is primarily known as a platform for which of the following?

A. Web Development
B. Data Science and Machine Learning
C. Operating System Management
D. Network Security

17 In RapidMiner, the central workspace where you store data, processes, and results is called a:

A. Database
B. Repository
C. Warehouse
D. Registry

18 The building blocks of a RapidMiner process that perform specific actions (like loading data or training a model) are called:

A. Nodes
B. Operators
C. Tasks
D. Functions

19 In the RapidMiner GUI, which view is used to build and edit analysis processes?

A. Results View
B. Design View
C. Turbo Prep
D. Auto Model

20 In RapidMiner terminology, what are the rows of a dataset called?

A. Attributes
B. Examples
C. Factors
D. Dimensions

21 In RapidMiner terminology, what are the columns of a dataset called?

A. Attributes
B. Examples
C. Keys
D. Indices

22 What is the purpose of the 'Store' operator in RapidMiner?

A. To save a process result or dataset into the Repository
B. To export data to a CSV file
C. To cache data in RAM only
D. To pause the process

23 Which port on an operator typically provides the output data to be passed to the next operator?

A. inp (input)
B. out (output)
C. exa (example set)
D. mod (model)

24 To connect the final output of a process to the 'Results' view in RapidMiner, where must the wire be connected?

A. To the 'res' (result) port on the process panel wall
B. To the 'inp' port of the process
C. To the 'log' port
D. No connection is needed

25 Which feature in RapidMiner allows you to inspect the data flowing through a connection without finishing the whole process?

A. Breakpoints
B. Turbo Prep
C. Validation
D. Macros

26 When loading data for EDA in RapidMiner, which view provides immediate summary statistics (min, max, average) for all attributes?

A. Design View
B. Results View -> Statistics Tab
C. XML View
D. Log View

27 Which chart type in RapidMiner is best suited for visualizing the distribution of a single numerical attribute?

A. Scatter Plot
B. Histogram
C. Network Graph
D. Pie Chart

28 In RapidMiner visualization, what is a Scatter Plot primarily used for?

A. To see the correlation or relationship between two numerical attributes
B. To show the hierarchy of data
C. To view the summary statistics
D. To count missing values

29 If you want to identify 'Outliers' visually in RapidMiner, which plot is most effective?

A. Box Plot
B. Pie Chart
C. Area Chart
D. Venn Diagram

30 What does the 'Correlation Matrix' in RapidMiner help a user identify?

A. The number of rows in the data
B. The linear relationship strength between pairs of numerical attributes
C. The causal relationship between attributes
D. The missing values in the dataset

31 Which operator in RapidMiner is used to select specific columns (attributes) to keep or remove from the dataset?

A. Filter Examples
B. Select Attributes
C. Sort
D. Replace Missing Values

32 Which operator is used to filter rows based on specific conditions (e.g., Age > 25)?

A. Select Attributes
B. Filter Examples
C. Normalize
D. Append

33 In Data Preparation, what does 'Normalization' typically achieve?

A. It removes all missing values
B. It scales numeric attributes to a specific range (e.g., 0 to 1)
C. It converts text to numbers
D. It deletes duplicate rows

34 What is 'Standardization' (Z-transformation) in the context of RapidMiner data prep?

A. Scaling data to have a mean of 0 and a standard deviation of 1
B. Removing all outliers
C. Sorting data alphabetically
D. Rounding numbers to the nearest integer

35 How does the 'Replace Missing Values' operator handle data gaps?

A. It deletes the row containing the missing value
B. It replaces the missing value with a specified value (like the average) or a constant
C. It stops the process with an error
D. It leaves the value empty

36 If a dataset in RapidMiner contains a column 'Gender' with values 'M' and 'F', what is the data type of this attribute?

A. Real
B. Integer
C. Polynominal (Nominal)
D. Date_Time

37 Which operator allows you to change the type of an attribute, for example, from Integer to Real or Nominal to Text?

A. Numerical to Polynominal
B. Guess Types
C. Select Attributes
D. Rename

38 What is the purpose of the 'Remove Duplicates' operator?

A. To remove columns with similar names
B. To remove rows that are identical across all (or selected) attributes
C. To remove attributes with constant values
D. To remove outliers

39 Which RapidMiner operator is used to merge two datasets horizontally based on a key attribute?

A. Append
B. Join
C. Union
D. Aggregate

40 If you want to create a new attribute calculated from existing attributes (e.g., Revenue = Price * Quantity), which operator should you use?

A. Generate Attributes
B. Select Attributes
C. Filter Examples
D. Discretize

41 What does the 'Discretize' operator do in RapidMiner?

A. Converts nominal data to numerical data
B. Converts continuous numerical data into bin/ranges (categorical)
C. Removes discrete values
D. Encrypts the data

42 In RapidMiner, what does the color 'Red' typically indicate in the Statistics view next to an attribute?

A. The data is sorted
B. The attribute is the label (target variable)
C. The attribute contains missing values
D. The attribute is highly correlated

43 Which SSIS control flow task is used to run a snippet of C# or VB.NET code?

A. Execute SQL Task
B. Script Task
C. Expression Task
D. Analysis Services Task

44 In RapidMiner, if you want to combine two datasets vertically (stacking them), which operator do you use?

A. Join
B. Append
C. Merge
D. Group By

45 What is the primary usage of the 'Sample' operator in RapidMiner?

A. To reduce the dataset size by selecting a subset of rows
B. To test the chemical properties of data
C. To generate synthetic data
D. To sort the data randomly

46 In SSIS, what is a 'Variable' used for?

A. To store temporary values that can be used across tasks and containers
B. To define the database schema
C. To create a primary key
D. To visualize data

47 Which RapidMiner view is specifically designed for quick, interactive data cleaning without building a complex process manually?

A. Turbo Prep
B. Auto Model
C. Design View
D. Background Process

48 When defining an SSIS connection manager for a flat file (CSV), what must be defined?

A. The primary key constraint
B. The column delimiter (e.g., comma, tab)
C. The SQL dialect
D. The server IP address

49 In RapidMiner, what does the 'Map' operator do?

A. Visualizes data on a geographical map
B. Replaces specific values in an attribute with new values based on a defined mapping
C. Joins tables together
D. Calculates the mean average

50 Which of the following represents the correct flow of an ETL process?

A. Load -> Transform -> Extract
B. Transform -> Extract -> Load
C. Extract -> Transform -> Load
D. Extract -> Load -> Transform