Unit 3 - Notes

INT306

Unit 3: Relational Operations

1. Aggregate Functions

Aggregate functions are mathematical functions used in SQL to perform a calculation on a set of values (a column or a group of rows) and return a single scalar value. They are essential for summarizing data.

Common Aggregate Functions

Function Description Null Handling
COUNT() Returns the number of rows that match a specified criterion. COUNT(*) includes NULLs. COUNT(column_name) ignores NULLs.
SUM() Returns the total sum of a numeric column. Ignores NULL values.
AVG() Returns the average value of a numeric column. Ignores NULL values.
MIN() Returns the smallest value of the selected column. Works on numbers, text (A-Z), and dates.
MAX() Returns the largest value of the selected column. Works on numbers, text, and dates.

The GROUP BY and HAVING Clauses

Aggregate functions are rarely used in isolation; they are usually paired with GROUP BY.

  • GROUP BY: Arranges identical data into groups. The aggregation is applied to each group independently.
  • HAVING: Filters records after the aggregation has occurred. (Unlike WHERE, which filters before aggregation).

Example:
Find the average salary of employees in each department, showing only departments with an average salary greater than 50,000.

SQL
SELECT DepartmentID, AVG(Salary) as AvgSalary
FROM Employees
GROUP BY DepartmentID
HAVING AVG(Salary) > 50000;


2. SQL Joins

Joins are used to combine rows from two or more tables based on a related column between them. This is the mechanism that allows Relational Databases to be "relational."

Types of Joins

1. Inner Join

Returns records that have matching values in both tables. If a row in Table A does not have a match in Table B, it is excluded.

SQL
SELECT Employees.Name, Departments.DeptName
FROM Employees
INNER JOIN Departments ON Employees.DeptID = Departments.DeptID;

2. Left (Outer) Join

Returns all records from the left table (Table A), and the matched records from the right table (Table B). If there is no match, the result is NULL on the right side.

SQL
-- Returns all employees, even those without a department
SELECT Employees.Name, Departments.DeptName
FROM Employees
LEFT JOIN Departments ON Employees.DeptID = Departments.DeptID;

3. Right (Outer) Join

Returns all records from the right table, and the matched records from the left table. Reverse of the Left Join.

SQL
-- Returns all departments, even those with no employees
SELECT Employees.Name, Departments.DeptName
FROM Employees
RIGHT JOIN Departments ON Employees.DeptID = Departments.DeptID;

4. Full (Outer) Join

Returns all records when there is a match in either left or right table. It combines the result of both Left and Right joins.

5. Cross Join (Cartesian Product)

Produces the Cartesian product of two tables. Every row in Table A is combined with every row in Table B.

  • Formula: If Table A has rows and Table B has rows, the result is rows.

SQL
SELECT * FROM Colors CROSS JOIN Sizes;

6. Self Join

A table is joined with itself. This is useful for hierarchical data (e.g., an Employee table containing both Staff and Managers). You must use aliases to distinguish the two instances of the table.

SQL
-- E1 represents Employee, E2 represents Manager
SELECT E1.Name as Employee, E2.Name as Manager
FROM Employees E1
INNER JOIN Employees E2 ON E1.ManagerID = E2.EmployeeID;


3. Set Operators

Set operators combine the result sets of two or more SELECT statements. Unlike Joins (which combine columns horizontally), Set Operators combine rows vertically.

Prerequisites for Set Operations:

  1. The number of columns in both queries must be the same.
  2. The data types of the corresponding columns must be compatible.

Types of Set Operators

1. UNION vs. UNION ALL

  • UNION: Combines the result-set of two or more SELECT statements and removes duplicate rows.
  • UNION ALL: Combines the result-set but keeps duplicate rows. It is faster than UNION because it skips the duplicate check.

SQL
SELECT City FROM Customers
UNION
SELECT City FROM Suppliers;

2. INTERSECT

Returns only the distinct rows that are present in both SELECT statements. Equivalent to the mathematical intersection ().

SQL
-- Finds IDs present in both tables
SELECT ID FROM TableA
INTERSECT
SELECT ID FROM TableB;

3. MINUS (or EXCEPT)

Returns distinct rows from the first SELECT statement that are not present in the second SELECT statement. Equivalent to mathematical set difference ().

  • Note: Oracle uses MINUS; SQL Server/PostgreSQL use EXCEPT.

SQL
-- IDs in TableA but NOT in TableB
SELECT ID FROM TableA
EXCEPT
SELECT ID FROM TableB;


4. Subqueries (Nested Queries)

A subquery is a query nested inside another query (SELECT, INSERT, UPDATE, or DELETE). The inner query executes first, and its result is passed to the outer query.

Types of Subqueries

1. Scalar Subquery

Returns a single value (one row, one column).

SQL
-- Find employees earning more than the average
SELECT Name, Salary
FROM Employees
WHERE Salary > (SELECT AVG(Salary) FROM Employees);

2. Multi-row Subquery

Returns multiple rows (one column). Used with operators like IN, ANY, or ALL.

SQL
-- Find employees in IT or HR (assuming we don't know the IDs)
SELECT Name
FROM Employees
WHERE DeptID IN (SELECT DeptID FROM Departments WHERE Name = 'IT' OR Name = 'HR');

3. Correlated Subquery

A subquery that depends on the outer query for its values. It executes repeatedly, once for each row selected by the outer query. This is often less efficient than a non-correlated subquery (Join is usually preferred).

SQL
-- Find employees whose salary is higher than the average of THEIR department
SELECT e1.Name, e1.Salary
FROM Employees e1
WHERE e1.Salary > (
    SELECT AVG(e2.Salary)
    FROM Employees e2
    WHERE e2.DeptID = e1.DeptID
);


5. Views

A View is a virtual table based on the result-set of an SQL statement. It contains rows and columns just like a real table, but the fields are from one or more real tables in the database.

Purpose and Benefits

  1. Security: Restrict user access to specific rows or columns (e.g., hiding the Salary column).
  2. Complexity Abstraction: Hides complex Joins or aggregations behind a simple table name.
  3. Data Independence: Changes to the underlying table structure can often be mitigated by updating the view, without changing the application code.

Syntax

Creating a View:

SQL
CREATE VIEW HighValueCustomers AS
SELECT CustomerName, ContactName
FROM Customers
WHERE OrderTotal > 10000;

Querying a View:

SQL
SELECT * FROM HighValueCustomers;

Dropping a View:

SQL
DROP VIEW HighValueCustomers;

Updating Views:
Views are generally updatable (INSERT/UPDATE/DELETE) only if they are simple (i.e., contain no aggregates, no DISTINCT, no GROUP BY, and map 1:1 to a base table). Complex views are read-only.


6. Relational Algebra

Relational Algebra is a procedural query language. It provides the theoretical foundation for relational databases and SQL. It describes how to retrieve data (step-by-step procedure).

Fundamental Operations

1. Selection ()

  • Type: Unary (operates on one relation).
  • Purpose: Selects rows (tuples) that satisfy a specific predicate (condition). Equivalent to SQL WHERE.
  • Notation:
  • Example:

2. Projection ()

  • Type: Unary.
  • Purpose: Selects specific columns (attributes) and discards the rest. Removes duplicates. Equivalent to SQL SELECT DISTINCT column.
  • Notation:
  • Example:

3. Union ()

  • Type: Binary.
  • Purpose: Returns tuples present in relation A OR relation B.
  • Notation:
  • Constraint: A and B must be union-compatible (same degree and domains).

4. Set Difference ()

  • Type: Binary.
  • Purpose: Returns tuples present in relation A but NOT in relation B.
  • Notation:

5. Cartesian Product ()

  • Type: Binary.
  • Purpose: Combines every tuple of relation A with every tuple of relation B.
  • Notation:

6. Rename ()

  • Type: Unary.
  • Purpose: Renames a relation or attributes (useful for self-joins).
  • Notation:

Derived Operations (Composite)

1. Intersection ()

Returns tuples present in both A and B.

  • Derivation:

2. Join ()

Combines related tuples from two relations.

  • Natural Join: Links tables by matching names of columns automatically.
  • Theta Join (): A Cartesian product followed by a Selection condition ().
  • Notation:

3. Division ()

Used for queries involving "for all" or "every."

  • Example: Find students who have enrolled in all courses listed in the RequiredCourses table.
  • Notation:
  • Logic: If and , then returns all such that for every in , the pair exists in .