Unit 3 - Notes

INT374

Unit 3: Building and Structuring Data Models

1. Data Modeling Fundamentals

Definition

Data modeling in Power BI is the process of defining relationships between different data tables to create a unified semantic model. It bridges the gap between raw data ingestion (Power Query) and data visualization (Reports).

The Role of the Data Model

  • Performance: A well-optimized model utilizes the VertiPaq engine efficiently, compressing data and speeding up calculations.
  • Accuracy: Ensures calculations (measures) respect the correct context and granularity.
  • Usability: Simplifies report creation by organizing fields logically for the end-user.

Conceptual Workflow

  1. Get Data: Connect to sources.
  2. Transform: Cleanse in Power Query Editor.
  3. Model: Define relationships, hierarchies, and formatting in the Model View.
  4. Visualize: Build charts based on the model.

2. Database Normalization

Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. While crucial for transactional databases (OLTP), Power BI often requires a balance between normalized and denormalized structures (OLAP).

Key Normal Forms

  1. First Normal Form (1NF): Eliminate repeating groups in individual tables; create a separate table for each set of related data.
  2. Second Normal Form (2NF): Eliminate redundant data; ensure all non-key columns depend on the primary key.
  3. Third Normal Form (3NF): Eliminate columns not dependent on the key (transitive dependencies).

Normalization in Power BI Context

  • Highly Normalized: Good for storage efficiency but requires complex joins, which can slow down analytical queries.
  • Denormalized: Contains redundant data (e.g., repeating customer names in a sales table) but reads faster.
  • The Power BI Approach: We typically aim for a Dimensional Model (Star Schema), which is partially denormalized to optimize read performance.

3. Fact & Dimension Tables

Data modeling in Power BI relies on distinguishing between two types of tables.

Dimension Tables (The "Lookup" Tables)

  • Purpose: Provide context (Who, What, Where, When).
  • Content: Descriptive attributes (text, dates, locations).
  • Structure: "Wide" (many columns) and "Short" (fewer rows unique to the entity).
  • Primary Key: Must have a unique identifier column.
  • Examples: Dim_Customer, Dim_Product, Dim_Date, Dim_Region.

Fact Tables (The "Data" Tables)

  • Purpose: Record events or transactions (How much, How many).
  • Content: Quantitative data (numbers, metrics) and Foreign Keys.
  • Structure: "Narrow" (fewer columns) and "Long" (millions/billions of rows).
  • Foreign Keys: Connects to Dimension tables.
  • Examples: Fact_Sales, Fact_Returns, Fact_Budget.

4. Relationships and Keys

Relationships enable the propagation of filters from one table to another. They rely on matching columns between tables.

Primary Keys (PK)

  • A column that uniquely identifies each row in a table.
  • Location: Found in Dimension tables.
  • Requirement: Must be unique (no duplicates) and non-null.

Foreign Keys (FK)

  • A column that points to a Primary Key in another table.
  • Location: Found in Fact tables.
  • Requirement: Can contain duplicate values (e.g., the same Customer ID appears multiple times in a Sales table).

Surrogate vs. Natural Keys

  • Natural Key: A key derived from the data (e.g., Email Address, Social Security Number).
  • Surrogate Key: An artificial, unique identifier assigned by the database (e.g., Integer 1, 2, 3...).
  • Best Practice: Prefer surrogate keys (Integers) for relationships in Power BI as they are faster for the VertiPaq engine to process than text strings.

5. Managing Relationships

Relationships are managed in the Model View tab in Power BI Desktop.

Creating and Editing Relationships

  • Auto-Detect: Power BI attempts to detect relationships upon data load based on identical column names. (Use with caution).
  • Drag-and-Drop: Drag a field (Key) from a Dimension table and drop it onto the matching field in the Fact table.
  • Manage Relationships Dialog: A menu-driven interface to view, edit, create, or delete relationships explicitly.

Cardinality Types

Cardinality defines the numerical relationship between rows in one table and rows in another.

  1. One-to-Many (1:) / Many-to-One (:1):

    • Description: The most common and ideal relationship. One record in the Dimension table (e.g., Product A) relates to many records in the Fact table (Sales of Product A).
    • Icon: The relationship line shows a 1 on one side and an * on the other.
  2. One-to-One (1:1):

    • Description: Both tables share a unique key.
    • Use Case: Often indicates the tables should be merged into a single table, unless splitting is necessary for security or performance (vertical partitioning).
  3. Many-to-Many (:):

    • Description: Neither column is unique. (e.g., Sales targets by team, where teams and targets both have duplicates).
    • Risk: Can produce unexpected results.
    • Solution: Usually resolved by creating a Bridge Table containing unique keys between the two tables.

6. Schemas: Star vs. Snowflake

Star Schema (The Gold Standard)

  • Structure: A central Fact table surrounded by Dimension tables.
  • Visual: Looks like a star.
  • Key Feature: Dimensions are denormalized (flattened).
    • Example: Dim_Product contains Product Name, Category, and Subcategory all in one table.
  • Benefits:
    • Simplest DAX code.
    • Fastest performance.
    • Fewer relationships for the engine to traverse.

Snowflake Schema

  • Structure: Dimension tables are normalized and split into sub-dimensions.
  • Visual: Looks like a snowflake branching out.
  • Key Feature: Hierarchies are stored in separate tables.
    • Example: Fact_Sales connects to Dim_Product, which connects to Dim_SubCategory, which connects to Dim_Category.
  • Drawbacks:
    • Requires traversing multiple relationships (slower).
    • Filters must propagate through a chain of tables.
  • Recommendation: "Snowflake" only when necessary; otherwise, collapse dimensions into a Star Schema using Power Query.

7. Filter Context and Flow

Understanding filter propagation is critical for accurate reporting.

Filter Propagation

  • Filters flow downhill from the One (1) side to the *Many ()** side.
  • Example: If you filter Dim_Date to "2023", the filter flows down the relationship to Fact_Sales, restricting the view to only 2023 sales rows.

Visualizing Flow

In Model View, the relationship line has an arrow/chevron (>) indicating the direction of the filter flow.

  • Standard: Dimension --(Filter Flows)--> Fact.

8. Bi-Directional Filters & Ambiguity

Bi-Directional Filtering (Cross-filter direction: Both)

  • Definition: Allows the filter to flow in both directions. The Fact table can filter the Dimension table, and vice versa.
  • Use Case: Sometimes used when filtering a slicer based on data present in another table (e.g., showing only Customers who actually bought a specific Product).
  • Performance Impact: High. It forces the engine to perform complex calculations.
  • Best Practice: Avoid "Both" direction unless absolutely necessary. Use DAX CROSSFILTER function for specific measure requirements instead of changing the model permanently.

Ambiguity

  • Definition: Occurs when there are multiple paths for a filter to propagate from Table A to Table B.
  • Result: Power BI may disable one relationship (make it inactive, represented by a dotted line) to prevent circular logic.
  • UserActiveRelationship: In DAX, you can invoke inactive relationships explicitly using USERELATIONSHIP().

9. Data Formats & Categories

Proper formatting ensures data is displayed correctly and recognized by Power BI's internal engines (like Q&A and Maps).

Data Formatting

  • Percentages: Ensure decimals (0.5) are formatted as % (50%).
  • Currency: Apply currency symbols and decimal precision.
  • Custom Formats: Used for specific business needs (e.g., #,##0;(#,##0) for accounting style).

Data Categorization

Assists the geospatial engine and Q&A features. Configured in the "Data Category" dropdown.

  • Geospatial: Categorize columns as Country, State, City, Postal Code, or Latitude/Longitude. This ensures Bing Maps plots the data correctly.
  • URLs:
    • Web URL: Makes the text a clickable link.
    • Image URL: Allows images hosted online to render inside visuals (like tables or cards).
  • Barcode: For mobile app scanning integration.

10. Creating Hierarchies

Hierarchies provide a predetermined path for data drill-down, enhancing user experience.

Definition

A group of columns arranged in levels of granularity, from highest (broadest) to lowest (most detailed).

Common Examples

  1. Date Hierarchy: Year Quarter Month Day.
  2. Geography Hierarchy: Country State/Province City Zip Code.
  3. Product Hierarchy: Category Subcategory SKU.

Implementation

  1. In Model View or Data View, right-click the highest-level field (e.g., Year).
  2. Select Create Hierarchy.
  3. Right-click the next level fields (e.g., Quarter) and select Add to hierarchy.
  4. Benefits: Enables the "Drill Down" buttons on charts automatically, allowing users to explore data depth without adding multiple fields manually to every visual.