Unit 4 - Notes
INT374
Unit 4: Advanced Calculations with DAX
1. Introduction to DAX
What is DAX?
DAX (Data Analysis Expressions) is a formula expression language used in Analysis Services, Power BI, and Power Pivot in Excel. It is a collection of functions, operators, and constants that can be used in a formula, or expression, to calculate and return one or more values.
Key Characteristics
- Functional Language: DAX formulas resemble Excel formulas but are more powerful and designed for relational data.
- Context-Aware: The result of a DAX query is dynamic and changes based on the "Context" (Filter Context and Row Context) applied in the report.
- Not a Programming Language: It does not support loops or traditional flow control (like
fororwhile) in the way Python or C# do; it relies on table scanning and filtering.
2. Calculated Columns & Measures
Understanding the difference between these two is fundamental to mastering Power BI.
Calculated Columns
- Definition: A new column added to an existing table in the data model. The formula is evaluated row-by-row.
- Storage: Values are calculated immediately upon creation or data refresh and stored in RAM (part of the model file size).
- Calculation Time: Calculated during data load/refresh.
- Context: Uses Row Context (current row).
- Use Case: When you need the data in an axis, slicer, or filter (e.g., creating a "Full Name" column from "First" and "Last").
Measures
- Definition: A calculation used for aggregation. It does not add data to the table.
- Storage: Does not store values; uses CPU power.
- Calculation Time: Calculated on the fly (at query time) when a user interacts with a visual.
- Context: Uses Filter Context (selections made in the report).
- Use Case: Numerical analysis (sums, averages, ratios, year-over-year growth).
| Feature | Calculated Column | Measure |
|---|---|---|
| Evaluation | Row-by-row | Based on filter context (aggregates) |
| Storage | Stored in memory (Disk/RAM) | Not stored (CPU intensive) |
| Recalculation | On data refresh | On visual interaction |
| Best For | Slicers, Rows/Columns in Matrix | Values area in charts/tables |
3. Implicit vs. Explicit Measures
Implicit Measures
These are automatic aggregations created when you drag a numeric column (e.g., Sales Amount) into the "Values" field of a visual. Power BI automatically applies a default aggregation (Sum, Count, Average).
- Pros: Fast and easy for ad-hoc analysis.
- Cons: Limited functionality; cannot be referenced by other measures; difficult to maintain in large models.
Explicit Measures
These are measures manually created using DAX code.
- Syntax:
Total Sales = SUM(Sales[Amount]) - Pros:
- Reusable: Can be nested inside other measures.
- Logic: Can handle complex logic (e.g., Time Intelligence).
- Formatting: Formatting stays with the measure regardless of where it is used.
- Best Practice: Always use explicit measures for production reports.
4. DAX Syntax & Operators
Standard Syntax
Measure Name = FUNCTION('Table Name'[Column Name])
- Formula Bar: Where code is written.
- Measure/Column Name: Brackets
[]are used for column names; Single quotes' 'are used for table names (optional if table name has no spaces, but recommended).
Operators
Arithmetic Operators
+(Addition)-(Subtraction)*(Multiplication)/(Division)^(Exponentiation)
Comparison Operators
=(Equal to)<>(Not equal to)>(Greater than)<(Less than)>=(Greater than or equal to)<=(Less than or equal to)
Text Operator
&(Concatenation - joins two text strings)
Logical Operators
&&(AND condition)||(OR condition)IN(Checks if a value exists in a list:'Table'[Col] IN { "A", "B" })
5. DAX Functions: Basic Math & Statistical
These functions perform aggregations over columns.
Mathematical Functions
- SUM: Adds all numbers in a column.
DAXTotal Revenue = SUM(Sales[Revenue]) - DIVIDE: Performs division and handles divide-by-zero errors gracefully (returns BLANK or an alternate result instead of "Infinity").
DAXProfit Margin = DIVIDE([Total Profit], [Total Sales], 0) - ABS: Returns the absolute value of a number.
Statistical Functions
- AVERAGE: Returns the arithmetic mean of all numbers in a column.
- MAX / MIN: Returns the largest or smallest value in a column (works on numeric, date, and text).
- COUNT: Counts the number of cells in a column that contain numbers.
- COUNTA: Counts the number of cells in a column that are not empty (works on any data type).
- COUNTROWS: Counts the number of rows in the specified table.
DAXTotal Transactions = COUNTROWS(Sales) - DISTINCTCOUNT: Counts the number of distinct (unique) values in a column.
6. Conditional & Logical Functions
Used to test conditions and return different results.
IF
Checks a condition, and returns one value if TRUE, and another if FALSE.
Price Category = IF(Product[Price] > 500, "High", "Low")
SWITCH
Ideally used when there are multiple conditions (replaces nested IF statements).
Month Name = SWITCH(
TRUE(),
'Date'[MonthNum] = 1, "January",
'Date'[MonthNum] = 2, "February",
'Date'[MonthNum] = 3, "March",
"Other"
)
AND / OR / NOT
- AND: Checks if both arguments are TRUE.
- OR: Checks if one of the arguments is TRUE.
- NOT: Changes FALSE to TRUE, or TRUE to FALSE.
7. Text Functions
Used to manipulate text strings, usually within Calculated Columns.
- CONCATENATE: Joins two text strings into one (similar to
&). - LEFT / RIGHT: Returns a specified number of characters from the start (Left) or end (Right) of a string.
- MID: Returns a specific number of characters from the middle of a string.
- LEN: Returns the number of characters in a string.
- UPPER / LOWER: Converts text to uppercase or lowercase.
- TRIM: Removes all spaces from text except for single spaces between words.
- FORMAT: Converts a value to text in a specified format.
DAXFormatted Date = FORMAT('Date'[Date], "MM-YYYY")
8. Date & Time Functions
Used for calculations involving dates and time. Note: These are distinct from "Time Intelligence" functions (like TOTALYTD) which modify filter context.
- TODAY(): Returns the current date.
- NOW(): Returns the current date and time.
- YEAR / MONTH / DAY: Extracts the respective component from a date column.
DAXYearColumn = YEAR('Sales'[OrderDate]) - DATEDIFF: Calculates the difference between two dates in specified intervals (Day, Month, Year, Hour, etc.).
DAXDays to Ship = DATEDIFF(Sales[OrderDate], Sales[ShipDate], DAY) - CALENDAR / CALENDARAUTO: Returns a table with a single column named "Date". Used to create date tables dynamically.
9. Advanced DAX Functions
This section covers the most powerful engines of DAX: Context modification and iteration.
CALCULATE
The most important function in DAX. It evaluates an expression in a context that is modified by specific filters.
- Syntax:
CALCULATE(Expression, [Filter1], [Filter2], ...) - Functionality: It can add, remove, or update the filters currently applied to the report.
- Example: Calculate sales only for the color "Red", ignoring whatever other filters are on the page regarding color.
DAXRed Sales = CALCULATE(SUM(Sales[Amount]), Product[Color] = "Red")
FILTER
FILTER is an iterator that returns a table, not a single value. It is rarely used alone; it is almost always used inside CALCULATE to handle complex filtering logic that simple boolean expressions cannot handle (e.g., comparing a column against a measure).
- Syntax:
FILTER(Table, FilterExpression) - Example: Calculate sales for products where Total Sales > 1000.
DAXHigh Vol Sales = CALCULATE( [Total Sales], FILTER(Product, [Total Sales] > 1000) )
RELATED
Used in Calculated Columns or Row Context. It retrieves a related value from another table, similar to VLOOKUP in Excel.
- Prerequisite: An active relationship must exist between the tables.
- Direction: Works from the "Many" side to the "One" side of a relationship.
- Example: In the Sales table (Many), you want to bring in the Product Name from the Product table (One).
DAXProduct Name In Sales = RELATED('Product'[ProductName])
Iterator Functions (X-Functions)
Standard aggregation functions (SUM, AVERAGE) aggregate a single column. Iterators (SUMX, AVERAGEX, MINX, MAXX) iterate through a table row-by-row, perform a calculation, and then aggregate the result.
- Syntax:
SUMX(Table, Expression) - Use Case: When math must occur before the summation (e.g., Price * Quantity).
- Example (Total Sales):
If you do not have a "Sales Amount" column, but you have "Price" and "Quantity":- Wrong:
SUM(Price) * SUM(Quantity)(This sums all prices and multiplies by sum of all quantities). - Correct:
SUMX(Sales, Sales[Price] * Sales[Quantity])(This multiplies Price * Quantity for every row, then sums the results).
- Wrong: