Unit 4 - Notes

INT374

Unit 4: Advanced Calculations with DAX

1. Introduction to DAX

What is DAX?

DAX (Data Analysis Expressions) is a formula expression language used in Analysis Services, Power BI, and Power Pivot in Excel. It is a collection of functions, operators, and constants that can be used in a formula, or expression, to calculate and return one or more values.

Key Characteristics

  • Functional Language: DAX formulas resemble Excel formulas but are more powerful and designed for relational data.
  • Context-Aware: The result of a DAX query is dynamic and changes based on the "Context" (Filter Context and Row Context) applied in the report.
  • Not a Programming Language: It does not support loops or traditional flow control (like for or while) in the way Python or C# do; it relies on table scanning and filtering.

2. Calculated Columns & Measures

Understanding the difference between these two is fundamental to mastering Power BI.

Calculated Columns

  • Definition: A new column added to an existing table in the data model. The formula is evaluated row-by-row.
  • Storage: Values are calculated immediately upon creation or data refresh and stored in RAM (part of the model file size).
  • Calculation Time: Calculated during data load/refresh.
  • Context: Uses Row Context (current row).
  • Use Case: When you need the data in an axis, slicer, or filter (e.g., creating a "Full Name" column from "First" and "Last").

Measures

  • Definition: A calculation used for aggregation. It does not add data to the table.
  • Storage: Does not store values; uses CPU power.
  • Calculation Time: Calculated on the fly (at query time) when a user interacts with a visual.
  • Context: Uses Filter Context (selections made in the report).
  • Use Case: Numerical analysis (sums, averages, ratios, year-over-year growth).
Feature Calculated Column Measure
Evaluation Row-by-row Based on filter context (aggregates)
Storage Stored in memory (Disk/RAM) Not stored (CPU intensive)
Recalculation On data refresh On visual interaction
Best For Slicers, Rows/Columns in Matrix Values area in charts/tables

3. Implicit vs. Explicit Measures

Implicit Measures

These are automatic aggregations created when you drag a numeric column (e.g., Sales Amount) into the "Values" field of a visual. Power BI automatically applies a default aggregation (Sum, Count, Average).

  • Pros: Fast and easy for ad-hoc analysis.
  • Cons: Limited functionality; cannot be referenced by other measures; difficult to maintain in large models.

Explicit Measures

These are measures manually created using DAX code.

  • Syntax: Total Sales = SUM(Sales[Amount])
  • Pros:
    • Reusable: Can be nested inside other measures.
    • Logic: Can handle complex logic (e.g., Time Intelligence).
    • Formatting: Formatting stays with the measure regardless of where it is used.
  • Best Practice: Always use explicit measures for production reports.

4. DAX Syntax & Operators

Standard Syntax

DAX
Measure Name = FUNCTION('Table Name'[Column Name])

  • Formula Bar: Where code is written.
  • Measure/Column Name: Brackets [] are used for column names; Single quotes ' ' are used for table names (optional if table name has no spaces, but recommended).

Operators

Arithmetic Operators

  • + (Addition)
  • - (Subtraction)
  • * (Multiplication)
  • / (Division)
  • ^ (Exponentiation)

Comparison Operators

  • = (Equal to)
  • <> (Not equal to)
  • > (Greater than)
  • < (Less than)
  • >= (Greater than or equal to)
  • <= (Less than or equal to)

Text Operator

  • & (Concatenation - joins two text strings)

Logical Operators

  • && (AND condition)
  • || (OR condition)
  • IN (Checks if a value exists in a list: 'Table'[Col] IN { "A", "B" })

5. DAX Functions: Basic Math & Statistical

These functions perform aggregations over columns.

Mathematical Functions

  • SUM: Adds all numbers in a column.
    DAX
        Total Revenue = SUM(Sales[Revenue])
        
  • DIVIDE: Performs division and handles divide-by-zero errors gracefully (returns BLANK or an alternate result instead of "Infinity").
    DAX
        Profit Margin = DIVIDE([Total Profit], [Total Sales], 0)
        
  • ABS: Returns the absolute value of a number.

Statistical Functions

  • AVERAGE: Returns the arithmetic mean of all numbers in a column.
  • MAX / MIN: Returns the largest or smallest value in a column (works on numeric, date, and text).
  • COUNT: Counts the number of cells in a column that contain numbers.
  • COUNTA: Counts the number of cells in a column that are not empty (works on any data type).
  • COUNTROWS: Counts the number of rows in the specified table.
    DAX
        Total Transactions = COUNTROWS(Sales)
        
  • DISTINCTCOUNT: Counts the number of distinct (unique) values in a column.

6. Conditional & Logical Functions

Used to test conditions and return different results.

IF

Checks a condition, and returns one value if TRUE, and another if FALSE.

DAX
Price Category = IF(Product[Price] > 500, "High", "Low")

SWITCH

Ideally used when there are multiple conditions (replaces nested IF statements).

DAX
Month Name = SWITCH(
    TRUE(),
    'Date'[MonthNum] = 1, "January",
    'Date'[MonthNum] = 2, "February",
    'Date'[MonthNum] = 3, "March",
    "Other"
)

AND / OR / NOT

  • AND: Checks if both arguments are TRUE.
  • OR: Checks if one of the arguments is TRUE.
  • NOT: Changes FALSE to TRUE, or TRUE to FALSE.

7. Text Functions

Used to manipulate text strings, usually within Calculated Columns.

  • CONCATENATE: Joins two text strings into one (similar to &).
  • LEFT / RIGHT: Returns a specified number of characters from the start (Left) or end (Right) of a string.
  • MID: Returns a specific number of characters from the middle of a string.
  • LEN: Returns the number of characters in a string.
  • UPPER / LOWER: Converts text to uppercase or lowercase.
  • TRIM: Removes all spaces from text except for single spaces between words.
  • FORMAT: Converts a value to text in a specified format.
    DAX
        Formatted Date = FORMAT('Date'[Date], "MM-YYYY")
        

8. Date & Time Functions

Used for calculations involving dates and time. Note: These are distinct from "Time Intelligence" functions (like TOTALYTD) which modify filter context.

  • TODAY(): Returns the current date.
  • NOW(): Returns the current date and time.
  • YEAR / MONTH / DAY: Extracts the respective component from a date column.
    DAX
        YearColumn = YEAR('Sales'[OrderDate])
        
  • DATEDIFF: Calculates the difference between two dates in specified intervals (Day, Month, Year, Hour, etc.).
    DAX
        Days to Ship = DATEDIFF(Sales[OrderDate], Sales[ShipDate], DAY)
        
  • CALENDAR / CALENDARAUTO: Returns a table with a single column named "Date". Used to create date tables dynamically.

9. Advanced DAX Functions

This section covers the most powerful engines of DAX: Context modification and iteration.

CALCULATE

The most important function in DAX. It evaluates an expression in a context that is modified by specific filters.

  • Syntax: CALCULATE(Expression, [Filter1], [Filter2], ...)
  • Functionality: It can add, remove, or update the filters currently applied to the report.
  • Example: Calculate sales only for the color "Red", ignoring whatever other filters are on the page regarding color.
    DAX
        Red Sales = CALCULATE(SUM(Sales[Amount]), Product[Color] = "Red")
        

FILTER

FILTER is an iterator that returns a table, not a single value. It is rarely used alone; it is almost always used inside CALCULATE to handle complex filtering logic that simple boolean expressions cannot handle (e.g., comparing a column against a measure).

  • Syntax: FILTER(Table, FilterExpression)
  • Example: Calculate sales for products where Total Sales > 1000.
    DAX
        High Vol Sales = CALCULATE(
            [Total Sales], 
            FILTER(Product, [Total Sales] > 1000)
        )
        

RELATED

Used in Calculated Columns or Row Context. It retrieves a related value from another table, similar to VLOOKUP in Excel.

  • Prerequisite: An active relationship must exist between the tables.
  • Direction: Works from the "Many" side to the "One" side of a relationship.
  • Example: In the Sales table (Many), you want to bring in the Product Name from the Product table (One).
    DAX
        Product Name In Sales = RELATED('Product'[ProductName])
        

Iterator Functions (X-Functions)

Standard aggregation functions (SUM, AVERAGE) aggregate a single column. Iterators (SUMX, AVERAGEX, MINX, MAXX) iterate through a table row-by-row, perform a calculation, and then aggregate the result.

  • Syntax: SUMX(Table, Expression)
  • Use Case: When math must occur before the summation (e.g., Price * Quantity).
  • Example (Total Sales):
    If you do not have a "Sales Amount" column, but you have "Price" and "Quantity":
    • Wrong: SUM(Price) * SUM(Quantity) (This sums all prices and multiplies by sum of all quantities).
    • Correct: SUMX(Sales, Sales[Price] * Sales[Quantity]) (This multiplies Price * Quantity for every row, then sums the results).