Unit 5 - Practice Quiz

INT323 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the primary file extension for an SSIS package?

A. .dtproj
B. .ssis
C. .sln
D. .dtsx

2 Which component in the SSIS architecture is responsible for defining the logical flow of tasks such as sending emails or executing SQL scripts?

A. Connection Managers
B. Data Flow
C. Control Flow
D. Event Handlers

3 In SSIS, which IDE is typically used for developing Integration Services projects?

A. Eclipse
B. SQL Server Management Studio (SSMS)
C. RapidMiner Studio
D. Visual Studio with SSDT

4 What is the primary function of a 'Connection Manager' in SSIS?

A. To debug the data flow
B. To establish and maintain links to data sources and destinations
C. To transform data types
D. To schedule the package execution

5 Which SSIS task is specifically designed to move data from a source to a destination while allowing for transformations?

A. Execute SQL Task
B. File System Task
C. Script Task
D. Data Flow Task

6 In an SSIS Data Flow, what does a 'Precedence Constraint' define?

A. The condition under which the next task in the Control Flow is executed
B. The speed of data transfer
C. The error handling mechanism inside a transformation
D. The data type of a column

7 Which SSIS transformation is used to combine data from two sorted inputs based on a matching column?

A. Multicast
B. Derived Column
C. Union All
D. Merge Join

8 What is the purpose of the 'Conditional Split' transformation in SSIS?

A. To combine multiple data streams into one
B. To remove duplicate rows
C. To route data rows to different outputs based on expressions
D. To convert data types

9 If you need to perform a lookup against a reference table to retrieve related columns, which SSIS transformation should you use?

A. Sort Transformation
B. Fuzzy Grouping
C. Row Count
D. Lookup Transformation

10 Which SSIS transformation creates new column values by applying expressions to existing columns?

A. Aggregate
B. Multicast
C. Derived Column
D. Data Conversion

11 In SSIS, what is required before using a 'Merge' or 'Merge Join' transformation?

A. The data must be aggregated
B. The data must be in XML format
C. The data must be sorted
D. The data must be normalized

12 What is the purpose of the 'Multicast' transformation in SSIS?

A. To perform an inner join
B. To merge multiple inputs into one
C. To filter data based on criteria
D. To send identical copies of the data to multiple output paths

13 Which transformation would you use to change the data type of a column (e.g., from String to Integer) in SSIS?

A. Data Conversion
B. Copy Column
C. Percentage Sampling
D. Character Map

14 In the SSIS ecosystem, where are project-level parameters and connection managers usually managed?

A. In the Windows Registry
B. In the destination database
C. In a text file on the desktop
D. In the Project.params file and Solution Explorer

15 What is the function of the 'Aggregate' transformation in SSIS?

A. To encrypt sensitive data
B. To split data into training and testing sets
C. To sort data alphabetically
D. To perform calculations like Sum, Average, or Count on grouped data

16 RapidMiner is primarily known as a platform for which of the following?

A. Operating System Management
B. Network Security
C. Web Development
D. Data Science and Machine Learning

17 In RapidMiner, the central workspace where you store data, processes, and results is called a:

A. Registry
B. Repository
C. Database
D. Warehouse

18 The building blocks of a RapidMiner process that perform specific actions (like loading data or training a model) are called:

A. Nodes
B. Functions
C. Operators
D. Tasks

19 In the RapidMiner GUI, which view is used to build and edit analysis processes?

A. Design View
B. Auto Model
C. Results View
D. Turbo Prep

20 In RapidMiner terminology, what are the rows of a dataset called?

A. Factors
B. Attributes
C. Dimensions
D. Examples

21 In RapidMiner terminology, what are the columns of a dataset called?

A. Keys
B. Examples
C. Attributes
D. Indices

22 What is the purpose of the 'Store' operator in RapidMiner?

A. To pause the process
B. To save a process result or dataset into the Repository
C. To cache data in RAM only
D. To export data to a CSV file

23 Which port on an operator typically provides the output data to be passed to the next operator?

A. inp (input)
B. exa (example set)
C. out (output)
D. mod (model)

24 To connect the final output of a process to the 'Results' view in RapidMiner, where must the wire be connected?

A. To the 'res' (result) port on the process panel wall
B. To the 'inp' port of the process
C. No connection is needed
D. To the 'log' port

25 Which feature in RapidMiner allows you to inspect the data flowing through a connection without finishing the whole process?

A. Breakpoints
B. Validation
C. Macros
D. Turbo Prep

26 When loading data for EDA in RapidMiner, which view provides immediate summary statistics (min, max, average) for all attributes?

A. XML View
B. Design View
C. Log View
D. Results View -> Statistics Tab

27 Which chart type in RapidMiner is best suited for visualizing the distribution of a single numerical attribute?

A. Network Graph
B. Scatter Plot
C. Pie Chart
D. Histogram

28 In RapidMiner visualization, what is a Scatter Plot primarily used for?

A. To show the hierarchy of data
B. To view the summary statistics
C. To count missing values
D. To see the correlation or relationship between two numerical attributes

29 If you want to identify 'Outliers' visually in RapidMiner, which plot is most effective?

A. Box Plot
B. Area Chart
C. Venn Diagram
D. Pie Chart

30 What does the 'Correlation Matrix' in RapidMiner help a user identify?

A. The causal relationship between attributes
B. The linear relationship strength between pairs of numerical attributes
C. The missing values in the dataset
D. The number of rows in the data

31 Which operator in RapidMiner is used to select specific columns (attributes) to keep or remove from the dataset?

A. Sort
B. Filter Examples
C. Replace Missing Values
D. Select Attributes

32 Which operator is used to filter rows based on specific conditions (e.g., Age > 25)?

A. Filter Examples
B. Normalize
C. Select Attributes
D. Append

33 In Data Preparation, what does 'Normalization' typically achieve?

A. It scales numeric attributes to a specific range (e.g., 0 to 1)
B. It removes all missing values
C. It deletes duplicate rows
D. It converts text to numbers

34 What is 'Standardization' (Z-transformation) in the context of RapidMiner data prep?

A. Scaling data to have a mean of 0 and a standard deviation of 1
B. Rounding numbers to the nearest integer
C. Sorting data alphabetically
D. Removing all outliers

35 How does the 'Replace Missing Values' operator handle data gaps?

A. It deletes the row containing the missing value
B. It replaces the missing value with a specified value (like the average) or a constant
C. It stops the process with an error
D. It leaves the value empty

36 If a dataset in RapidMiner contains a column 'Gender' with values 'M' and 'F', what is the data type of this attribute?

A. Real
B. Date_Time
C. Polynominal (Nominal)
D. Integer

37 Which operator allows you to change the type of an attribute, for example, from Integer to Real or Nominal to Text?

A. Numerical to Polynominal
B. Guess Types
C. Rename
D. Select Attributes

38 What is the purpose of the 'Remove Duplicates' operator?

A. To remove columns with similar names
B. To remove rows that are identical across all (or selected) attributes
C. To remove attributes with constant values
D. To remove outliers

39 Which RapidMiner operator is used to merge two datasets horizontally based on a key attribute?

A. Aggregate
B. Union
C. Append
D. Join

40 If you want to create a new attribute calculated from existing attributes (e.g., Revenue = Price * Quantity), which operator should you use?

A. Filter Examples
B. Select Attributes
C. Discretize
D. Generate Attributes

41 What does the 'Discretize' operator do in RapidMiner?

A. Converts continuous numerical data into bin/ranges (categorical)
B. Encrypts the data
C. Converts nominal data to numerical data
D. Removes discrete values

42 In RapidMiner, what does the color 'Red' typically indicate in the Statistics view next to an attribute?

A. The data is sorted
B. The attribute is the label (target variable)
C. The attribute contains missing values
D. The attribute is highly correlated

43 Which SSIS control flow task is used to run a snippet of C# or VB.NET code?

A. Analysis Services Task
B. Script Task
C. Execute SQL Task
D. Expression Task

44 In RapidMiner, if you want to combine two datasets vertically (stacking them), which operator do you use?

A. Append
B. Group By
C. Merge
D. Join

45 What is the primary usage of the 'Sample' operator in RapidMiner?

A. To sort the data randomly
B. To test the chemical properties of data
C. To generate synthetic data
D. To reduce the dataset size by selecting a subset of rows

46 In SSIS, what is a 'Variable' used for?

A. To create a primary key
B. To visualize data
C. To store temporary values that can be used across tasks and containers
D. To define the database schema

47 Which RapidMiner view is specifically designed for quick, interactive data cleaning without building a complex process manually?

A. Background Process
B. Turbo Prep
C. Design View
D. Auto Model

48 When defining an SSIS connection manager for a flat file (CSV), what must be defined?

A. The column delimiter (e.g., comma, tab)
B. The primary key constraint
C. The server IP address
D. The SQL dialect

49 In RapidMiner, what does the 'Map' operator do?

A. Joins tables together
B. Calculates the mean average
C. Visualizes data on a geographical map
D. Replaces specific values in an attribute with new values based on a defined mapping

50 Which of the following represents the correct flow of an ETL process?

A. Transform -> Extract -> Load
B. Extract -> Load -> Transform
C. Extract -> Transform -> Load
D. Load -> Transform -> Extract