Unit 5 - Notes

INT306 9 min read

Unit 5: Programming Constructs in Databases

1. Programming Constructs in Databases

Traditional SQL is a declarative language, meaning it specifies what to do but not how to do it. Procedural extensions (like PL/SQL in Oracle, T-SQL in SQL Server, or PL/pgSQL in PostgreSQL) add programming constructs to SQL, allowing developers to write complex, procedural logic directly within the database.

1.1 Flow Control Statements

Flow control statements dictate the execution path of a program based on specific conditions or loops.

  • Conditional Statements:
    • IF-THEN-ELSE: Executes specific blocks of code based on a boolean condition.
      SQL
              IF condition THEN
                  -- statements executed if true
              ELSIF condition2 THEN
                  -- statements executed if condition2 is true
              ELSE
                  -- statements executed if all conditions are false
              END IF;
              
    • CASE Statement: Evaluates a single expression against multiple potential values.
      SQL
              CASE expression
                  WHEN value1 THEN -- statements
                  WHEN value2 THEN -- statements
                  ELSE -- statements
              END CASE;
              
  • Iterative Statements (Loops):
    • Simple LOOP: Loops indefinitely until an EXIT condition is met.
    • WHILE LOOP: Executes a block as long as a specified condition remains true.
      SQL
              WHILE condition LOOP
                  -- statements
              END LOOP;
              
    • FOR LOOP: Iterates over a specific range of numbers or a cursor result set.
      SQL
              FOR i IN 1..10 LOOP
                  -- statements executed 10 times
              END LOOP;
              

1.2 Functions

A function is a subprogram that computes and returns a single value. Functions are primarily used for computations and can be called directly within SQL statements (e.g., in SELECT or WHERE clauses).

  • Characteristics: Must return a value. Can only take IN parameters. Cannot perform DML operations (INSERT, UPDATE, DELETE) on tables that are being queried.
  • Syntax (PL/SQL Example):
    SQL
        CREATE OR REPLACE FUNCTION calculate_bonus (salary IN NUMBER) 
        RETURN NUMBER IS
            bonus NUMBER;
        BEGIN
            bonus := salary * 0.10;
            RETURN bonus;
        END;
        

1.3 Stored Procedures

A stored procedure is a prepared SQL code that you can save and reuse. Unlike functions, procedures are executed using a CALL or EXECUTE statement and do not have to return a value.

  • Characteristics: Can return multiple values using OUT parameters. Can perform DML operations. Supports transaction management (COMMIT, ROLLBACK).
  • Parameter Modes:
    • IN: Passes a value into the procedure (read-only).
    • OUT: Returns a value to the caller.
    • INOUT: Passes a value in, and can return an updated value out.
  • Syntax:
    SQL
        CREATE OR REPLACE PROCEDURE update_salary (emp_id IN NUMBER, increase IN NUMBER) IS
        BEGIN
            UPDATE employees SET salary = salary + increase WHERE id = emp_id;
            COMMIT;
        END;
        

1.4 Cursors

A cursor is a temporary work area created in system memory used to handle multi-row queries. SQL is inherently set-based, but procedural logic often requires row-by-row processing. Cursors bridge this gap.

  • Implicit Cursors: Automatically created by the DBMS for any DML statement or single-row SELECT INTO.
  • Explicit Cursors: User-defined cursors for queries returning multiple rows.
  • Steps to Use an Explicit Cursor:
    1. DECLARE: Define the cursor and its query.
    2. OPEN: Execute the query and allocate memory.
    3. FETCH: Retrieve data row-by-row into variables.
    4. CLOSE: Release the allocated memory.
  • Syntax:
    SQL
        DECLARE
            CURSOR emp_cursor IS SELECT id, name FROM employees;
            v_id employees.id%TYPE;
            v_name employees.name%TYPE;
        BEGIN
            OPEN emp_cursor;
            LOOP
                FETCH emp_cursor INTO v_id, v_name;
                EXIT WHEN emp_cursor%NOTFOUND;
                -- Process row
            END LOOP;
            CLOSE emp_cursor;
        END;
        

1.5 Triggers

A trigger is a special type of stored procedure that automatically executes (fires) when a specific event occurs in the database server.

  • Types of Events: DML (INSERT, UPDATE, DELETE), DDL (CREATE, ALTER, DROP), or System events (logon, startup).
  • Timing:
    • BEFORE: Executes before the triggering event (useful for validation/modification of data).
    • AFTER: Executes after the event (useful for logging/auditing).
    • INSTEAD OF: Replaces the triggering event (often used on views).
  • Level:
    • Row-Level: Executes once for each row affected (FOR EACH ROW). Uses :NEW and :OLD pseudo-records to access row data.
    • Statement-Level: Executes once for the entire SQL statement, regardless of how many rows are affected.
  • Syntax:
    SQL
        CREATE OR REPLACE TRIGGER audit_emp_update
        AFTER UPDATE ON employees
        FOR EACH ROW
        BEGIN
            INSERT INTO audit_log (emp_id, old_salary, new_salary)
            VALUES (:OLD.id, :OLD.salary, :NEW.salary);
        END;
        

1.6 Exception Handling

Exception handling allows programs to intercept and gracefully resolve run-time errors, preventing program crashes.

  • System-Defined Exceptions: Predefined by the DBMS (e.g., NO_DATA_FOUND, TOO_MANY_ROWS, ZERO_DIVIDE).
  • User-Defined Exceptions: Custom errors defined and raised by the programmer using the RAISE statement.
  • Syntax:
    SQL
        BEGIN
            -- Normal execution block
            SELECT salary INTO v_salary FROM employees WHERE id = 999;
        EXCEPTION
            WHEN NO_DATA_FOUND THEN
                DBMS_OUTPUT.PUT_LINE('Employee not found.');
            WHEN OTHERS THEN
                DBMS_OUTPUT.PUT_LINE('An unexpected error occurred.');
        END;
        

2. Database Transaction Processing

2.1 Transaction System Concepts

A transaction is a logical unit of work that consists of one or more database operations (usually SQL statements). It represents a sequence of actions that must be executed as a whole.

  • States of a Transaction:
    1. Active: The initial state; the transaction stays here while it executes.
    2. Partially Committed: After the final statement has been executed, but before the actual commit.
    3. Failed: If normal execution can no longer proceed due to hardware/software failure or logical errors.
    4. Aborted: After the transaction has been rolled back and the database is restored to its state prior to the start of the transaction.
    5. Committed: After successful completion and changes are made permanent.

2.2 Desirable Properties of Transactions (ACID Properties)

To ensure database integrity, transactions must possess the following four properties:

  • Atomicity (All or Nothing): The entire transaction takes place at once or doesn't happen at all. If any part of a transaction fails, the entire transaction is rolled back. (Managed by the Transaction Management component).
  • Consistency: A transaction must transform the database from one valid consistent state to another. It must adhere to all constraints and rules defined in the database.
  • Isolation: Transactions executing concurrently must not interfere with each other. Intermediate states of a transaction should be invisible to other transactions. (Managed by the Concurrency Control subsystem).
  • Durability: Once a transaction is committed, its effects must be permanent in the database, surviving any subsequent system failures (power loss, crashes). (Managed by the Recovery subsystem).

3. Schedules and Serializability

3.1 Schedules

A schedule is the chronological execution sequence of operations from multiple concurrent transactions.

  • Serial Schedule: Transactions are executed one after the other, with no interleaving of operations. They are inherently consistent but offer poor performance and resource utilization.
  • Non-Serial (Concurrent) Schedule: Operations from multiple transactions are interleaved. This improves performance but risks data inconsistency if not controlled.

3.2 Serializability of Schedules

Serializability ensures that a concurrent (non-serial) schedule produces the exact same results and database state as some serial execution of those same transactions.

  • Conflict Serializability:

    • Two operations conflict if they belong to different transactions, access the same data item, and at least one is a Write (W) operation. (e.g., Read-Write, Write-Read, Write-Write conflicts).
    • A schedule is conflict-serializable if it can be transformed into a serial schedule by swapping non-conflicting adjacent operations.
    • Tested using a Precedence Graph (Serialization Graph). If the graph contains a cycle, the schedule is not conflict-serializable.
  • View Serializability:

    • A less restrictive form of serializability. Two schedules are view equivalent if they have the same initial reads, the same write-reads (reading values written by specific transactions), and the same final writes.
    • Every conflict-serializable schedule is view-serializable, but not vice versa. View serializability handles "Blind Writes" (writing a value without reading it first).

4. Concurrency Control

Concurrency control is the process of managing simultaneous operations without conflicting with one another, preserving Isolation.

4.1 Need for Concurrency Control

Without concurrency control, several problems arise:

  • Lost Update Problem: Two transactions update the same data simultaneously, and one update overwrites the other.
  • Dirty Read (Uncommitted Dependency): A transaction reads data written by a concurrent uncommitted transaction, which later rolls back.
  • Unrepeatable Read: A transaction reads a data item twice and gets different values because another transaction modified the data in between.
  • Phantom Read: A transaction reads a set of rows satisfying a condition, but another transaction inserts/deletes rows, causing subsequent reads in the first transaction to yield a different set of rows.

4.2 Concurrency Control Protocols

  1. Lock-Based Protocols: Data items are locked by a transaction to prevent concurrent access by others.

    • Shared Lock (S): Allows reading, but not writing. Multiple transactions can hold shared locks on the same item.
    • Exclusive Lock (X): Allows both reading and writing. Only one transaction can hold an exclusive lock on an item.
    • Two-Phase Locking (2PL): Guarantees conflict serializability. A transaction is divided into two phases:
      • Growing Phase: Transaction can acquire locks but cannot release any.
      • Shrinking Phase: Transaction can release locks but cannot acquire any new ones.
      • (Strict 2PL prevents dirty reads by holding all exclusive locks until commit/rollback).
  2. Timestamp-Ordering Protocols: Every transaction is assigned a unique timestamp based on when it started. Read/Write operations are strictly ordered based on these timestamps to prevent conflicts, ensuring serializability without locks (thus avoiding deadlocks).

  3. Validation-Based (Optimistic) Protocols: Assumes conflicts are rare. Transactions execute in a private workspace without locks. Before committing, a validation phase checks if conflicts occurred. If they did, the transaction is rolled back; otherwise, changes are written to the database.


5. Recoverability

While serializability handles isolation, recoverability ensures that the database can recover from failures and aborts without violating consistency.

5.1 Types of Schedules Based on Recoverability

When transactions read uncommitted data, the failure of one transaction can compromise others.

  • Recoverable Schedule: If Transaction T2 reads a value written by T1, T2 must commit after T1 commits. If T1 fails, T2 can also be rolled back.
  • Cascadeless Schedule: A schedule where no transaction reads data from an uncommitted transaction. This prevents Cascading Rollbacks (where the failure of one transaction forces the rollback of many dependent transactions). Every cascadeless schedule is also recoverable.
  • Strict Schedule: A transaction can neither read nor write a data item until the last transaction that wrote to it has committed or aborted. This is the most rigid form and simplifies recovery greatly.

5.2 Deadlocks

A deadlock occurs when two or more transactions are waiting for locks held by each other, resulting in a continuous standstill.

  • Deadlock Prevention: Ensures deadlocks never happen.
    • Wait-Die / Wound-Wait schemes: Use timestamps to determine if a transaction should wait or abort when requesting a lock held by another.
  • Deadlock Detection: Allows deadlocks to occur but detects them periodically using a Wait-For Graph. If a cycle exists, a deadlock has occurred.
  • Deadlock Recovery: Once detected, the system breaks the deadlock by:
    1. Selecting a Victim: Choosing a transaction to rollback (usually the one causing minimum cost).
    2. Rollback: Rolling back the victim completely or partially.
    3. Handling Starvation: Ensuring the same transaction is not repeatedly chosen as the victim (e.g., by factoring its age or number of rollbacks into the victim selection logic).