Unit 5 - Practice Quiz

INT323 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the primary file extension for an SSIS package?

A. .ssis
B. .dtproj
C. .dtsx
D. .sln

2 Which component in the SSIS architecture is responsible for defining the logical flow of tasks such as sending emails or executing SQL scripts?

A. Data Flow
B. Control Flow
C. Event Handlers
D. Connection Managers

3 In SSIS, which IDE is typically used for developing Integration Services projects?

A. RapidMiner Studio
B. Eclipse
C. Visual Studio with SSDT
D. SQL Server Management Studio (SSMS)

4 What is the primary function of a 'Connection Manager' in SSIS?

A. To transform data types
B. To debug the data flow
C. To establish and maintain links to data sources and destinations
D. To schedule the package execution

5 Which SSIS task is specifically designed to move data from a source to a destination while allowing for transformations?

A. Execute SQL Task
B. Script Task
C. File System Task
D. Data Flow Task

6 In an SSIS Data Flow, what does a 'Precedence Constraint' define?

A. The speed of data transfer
B. The error handling mechanism inside a transformation
C. The data type of a column
D. The condition under which the next task in the Control Flow is executed

7 Which SSIS transformation is used to combine data from two sorted inputs based on a matching column?

A. Merge Join
B. Multicast
C. Derived Column
D. Union All

8 What is the purpose of the 'Conditional Split' transformation in SSIS?

A. To convert data types
B. To combine multiple data streams into one
C. To remove duplicate rows
D. To route data rows to different outputs based on expressions

9 If you need to perform a lookup against a reference table to retrieve related columns, which SSIS transformation should you use?

A. Fuzzy Grouping
B. Sort Transformation
C. Lookup Transformation
D. Row Count

10 Which SSIS transformation creates new column values by applying expressions to existing columns?

A. Multicast
B. Aggregate
C. Derived Column
D. Data Conversion

11 In SSIS, what is required before using a 'Merge' or 'Merge Join' transformation?

A. The data must be in XML format
B. The data must be aggregated
C. The data must be sorted
D. The data must be normalized

12 What is the purpose of the 'Multicast' transformation in SSIS?

A. To merge multiple inputs into one
B. To filter data based on criteria
C. To send identical copies of the data to multiple output paths
D. To perform an inner join

13 Which transformation would you use to change the data type of a column (e.g., from String to Integer) in SSIS?

A. Copy Column
B. Data Conversion
C. Percentage Sampling
D. Character Map

14 In the SSIS ecosystem, where are project-level parameters and connection managers usually managed?

A. In the Project.params file and Solution Explorer
B. In a text file on the desktop
C. In the Windows Registry
D. In the destination database

15 What is the function of the 'Aggregate' transformation in SSIS?

A. To encrypt sensitive data
B. To split data into training and testing sets
C. To perform calculations like Sum, Average, or Count on grouped data
D. To sort data alphabetically

16 RapidMiner is primarily known as a platform for which of the following?

A. Operating System Management
B. Network Security
C. Web Development
D. Data Science and Machine Learning

17 In RapidMiner, the central workspace where you store data, processes, and results is called a:

A. Warehouse
B. Registry
C. Database
D. Repository

18 The building blocks of a RapidMiner process that perform specific actions (like loading data or training a model) are called:

A. Operators
B. Nodes
C. Functions
D. Tasks

19 In the RapidMiner GUI, which view is used to build and edit analysis processes?

A. Turbo Prep
B. Results View
C. Design View
D. Auto Model

20 In RapidMiner terminology, what are the rows of a dataset called?

A. Factors
B. Attributes
C. Dimensions
D. Examples

21 In RapidMiner terminology, what are the columns of a dataset called?

A. Indices
B. Examples
C. Attributes
D. Keys

22 What is the purpose of the 'Store' operator in RapidMiner?

A. To export data to a CSV file
B. To pause the process
C. To cache data in RAM only
D. To save a process result or dataset into the Repository

23 Which port on an operator typically provides the output data to be passed to the next operator?

A. exa (example set)
B. mod (model)
C. inp (input)
D. out (output)

24 To connect the final output of a process to the 'Results' view in RapidMiner, where must the wire be connected?

A. To the 'inp' port of the process
B. To the 'res' (result) port on the process panel wall
C. No connection is needed
D. To the 'log' port

25 Which feature in RapidMiner allows you to inspect the data flowing through a connection without finishing the whole process?

A. Turbo Prep
B. Breakpoints
C. Macros
D. Validation

26 When loading data for EDA in RapidMiner, which view provides immediate summary statistics (min, max, average) for all attributes?

A. Log View
B. XML View
C. Results View -> Statistics Tab
D. Design View

27 Which chart type in RapidMiner is best suited for visualizing the distribution of a single numerical attribute?

A. Pie Chart
B. Histogram
C. Network Graph
D. Scatter Plot

28 In RapidMiner visualization, what is a Scatter Plot primarily used for?

A. To see the correlation or relationship between two numerical attributes
B. To show the hierarchy of data
C. To view the summary statistics
D. To count missing values

29 If you want to identify 'Outliers' visually in RapidMiner, which plot is most effective?

A. Area Chart
B. Venn Diagram
C. Pie Chart
D. Box Plot

30 What does the 'Correlation Matrix' in RapidMiner help a user identify?

A. The causal relationship between attributes
B. The number of rows in the data
C. The linear relationship strength between pairs of numerical attributes
D. The missing values in the dataset

31 Which operator in RapidMiner is used to select specific columns (attributes) to keep or remove from the dataset?

A. Replace Missing Values
B. Sort
C. Filter Examples
D. Select Attributes

32 Which operator is used to filter rows based on specific conditions (e.g., Age > 25)?

A. Append
B. Select Attributes
C. Normalize
D. Filter Examples

33 In Data Preparation, what does 'Normalization' typically achieve?

A. It deletes duplicate rows
B. It converts text to numbers
C. It scales numeric attributes to a specific range (e.g., 0 to 1)
D. It removes all missing values

34 What is 'Standardization' (Z-transformation) in the context of RapidMiner data prep?

A. Removing all outliers
B. Sorting data alphabetically
C. Scaling data to have a mean of 0 and a standard deviation of 1
D. Rounding numbers to the nearest integer

35 How does the 'Replace Missing Values' operator handle data gaps?

A. It leaves the value empty
B. It deletes the row containing the missing value
C. It stops the process with an error
D. It replaces the missing value with a specified value (like the average) or a constant

36 If a dataset in RapidMiner contains a column 'Gender' with values 'M' and 'F', what is the data type of this attribute?

A. Real
B. Polynominal (Nominal)
C. Integer
D. Date_Time

37 Which operator allows you to change the type of an attribute, for example, from Integer to Real or Nominal to Text?

A. Guess Types
B. Rename
C. Numerical to Polynominal
D. Select Attributes

38 What is the purpose of the 'Remove Duplicates' operator?

A. To remove rows that are identical across all (or selected) attributes
B. To remove columns with similar names
C. To remove attributes with constant values
D. To remove outliers

39 Which RapidMiner operator is used to merge two datasets horizontally based on a key attribute?

A. Append
B. Union
C. Join
D. Aggregate

40 If you want to create a new attribute calculated from existing attributes (e.g., Revenue = Price * Quantity), which operator should you use?

A. Discretize
B. Generate Attributes
C. Select Attributes
D. Filter Examples

41 What does the 'Discretize' operator do in RapidMiner?

A. Removes discrete values
B. Converts continuous numerical data into bin/ranges (categorical)
C. Encrypts the data
D. Converts nominal data to numerical data

42 In RapidMiner, what does the color 'Red' typically indicate in the Statistics view next to an attribute?

A. The attribute is highly correlated
B. The data is sorted
C. The attribute contains missing values
D. The attribute is the label (target variable)

43 Which SSIS control flow task is used to run a snippet of C# or VB.NET code?

A. Expression Task
B. Analysis Services Task
C. Execute SQL Task
D. Script Task

44 In RapidMiner, if you want to combine two datasets vertically (stacking them), which operator do you use?

A. Join
B. Append
C. Group By
D. Merge

45 What is the primary usage of the 'Sample' operator in RapidMiner?

A. To reduce the dataset size by selecting a subset of rows
B. To test the chemical properties of data
C. To generate synthetic data
D. To sort the data randomly

46 In SSIS, what is a 'Variable' used for?

A. To define the database schema
B. To create a primary key
C. To visualize data
D. To store temporary values that can be used across tasks and containers

47 Which RapidMiner view is specifically designed for quick, interactive data cleaning without building a complex process manually?

A. Auto Model
B. Background Process
C. Turbo Prep
D. Design View

48 When defining an SSIS connection manager for a flat file (CSV), what must be defined?

A. The primary key constraint
B. The column delimiter (e.g., comma, tab)
C. The SQL dialect
D. The server IP address

49 In RapidMiner, what does the 'Map' operator do?

A. Visualizes data on a geographical map
B. Calculates the mean average
C. Replaces specific values in an attribute with new values based on a defined mapping
D. Joins tables together

50 Which of the following represents the correct flow of an ETL process?

A. Extract -> Transform -> Load
B. Load -> Transform -> Extract
C. Extract -> Load -> Transform
D. Transform -> Extract -> Load