Unit 6 - Notes
INT306
Unit 6: Database Transaction Processing
1. Transaction System Concepts
1.1 Definition of a Transaction
A transaction is a unit of program execution that accesses and possibly updates various data items. It is considered a logical unit of work that takes the database from one consistent state to another.
A transaction consists of all operations executed between the BEGIN TRANSACTION and END TRANSACTION statements.
1.2 Basic Operations
The two main operations performed by a transaction on data items are:
- Read(X): Transfers the data item from the database to a local buffer belonging to the transaction executing the read operation.
- Write(X): Transfers the data item from the local buffer of the transaction back to the database.
1.3 Transaction States
A transaction must be in one of the following states:
- Active: The initial state; the transaction stays in this state while it is executing.
- Partially Committed: After the final statement has been executed, but data may still reside in main memory (not yet flushed to disk).
- Failed: After the discovery that normal execution can no longer proceed (due to hardware failure or logical error).
- Aborted: After the transaction has been rolled back and the database restored to its state prior to the start of the transaction.
- Committed: After successful completion. The changes are permanently written to the database.
State Transition Diagram:
Active -> Partially Committed -> Committed
| |
v v
Failed <-------+
|
v
Aborted
2. Desirable Properties of Transactions (ACID)
To ensure data integrity, the database system maintains the ACID properties.
2.1 Atomicity
- Concept: "All or nothing." A transaction is an indivisible unit of processing. Either all its operations are executed, or none of them are.
- Mechanism: If a transaction fails halfway, the Recovery Manager uses logs to undo (rollback) any partial changes.
- Example: In a bank transfer from Account A to B, if money is debited from A but the system crashes before crediting B, the debit from A must be undone.
2.2 Consistency
- Concept: A transaction must transform the database from one valid state to another valid state. The database must satisfy all defined integrity constraints (e.g., primary keys, constraints, data types) before and after the transaction.
- Responsibility: The programmer/user ensures the transaction logic is correct; the DBMS ensures constraints are checked.
- Example: In a bank transfer, the sum of balances must be the same before and after the transaction.
2.3 Isolation
- Concept: Multiple transactions executing concurrently must not interfere with each other. The intermediate state of a transaction should be invisible to other transactions.
- Mechanism: Handled by the Concurrency Control Manager (usually via locking).
- Example: If is updating a row, should not read that row until is finished.
2.4 Durability
- Concept: Once a transaction commits, its changes are permanent, even in the event of a system crash or power failure.
- Mechanism: Handled by the Recovery Manager (usually via Write-Ahead Logging to non-volatile storage).
3. Schedules
A schedule is a sequence of instructions that specifies the chronological order in which instructions of concurrent transactions are executed.
3.1 Serial Schedules
A schedule is serial if the transactions are executed one after the other, with no interleaving of instructions.
- Format: or .
- Pros: Always ensures consistency (ACID).
- Cons: Poor system resource utilization; low throughput.
3.2 Concurrent (Non-Serial) Schedules
A schedule where the instructions of multiple transactions are interleaved.
- Pros: Improved CPU/Disk utilization (while one transaction waits for I/O, the CPU processes another).
- Risk: Can lead to data inconsistency if not properly managed.
| Example of Concurrent Schedule: | Transaction | Transaction |
|---|---|---|
| Read(A) | ||
| A = A - 50 | ||
| Read(A) | ||
| A = A * 1.1 | ||
| Write(A) | ||
| Write(A) |
Note: In the example above, reads a value of A that is being modified by but hasn't been written yet (if local buffers are involved) or overwrites 's work. This is a potential conflict.
4. Serializability of Schedules
A concurrent schedule is serializable if its outcome (the final state of the database) is equivalent to the outcome of some serial schedule of the same transactions.
4.1 Conflicting Operations
Two operations (from ) and (from ) conflict if:
- They belong to different transactions ().
- They access the same data item ().
- At least one of the operations is a
Write.
Conflict Types:
- Read-Write (RW): reads , writes .
- Write-Read (WR): writes , reads .
- Write-Write (WW): writes , writes .
(Note: Read-Read does not conflict.)
4.2 Conflict Serializability
A schedule is conflict serializable if it is conflict-equivalent to a serial schedule. This implies that the order of conflicting operations is the same as in some serial schedule.
Testing for Conflict Serializability (Precedence Graph):
- Create a node for each transaction.
- Draw a directed edge if there is a conflicting operation where acts before .
- Result: If the graph contains a cycle, the schedule is NOT conflict serializable. If there is no cycle, it is serializable.
4.3 View Serializability
View serializability is strictly weaker than conflict serializability (every conflict serializable schedule is view serializable, but not vice versa). It handles "blind writes" (writing without reading).
Two schedules and are view equivalent if:
- Initial Read: If reads the initial value of in , it must also read the initial value of in .
- Read from Write: If reads produced by in , it must do the same in .
- Final Write: The transaction that performs the final write on in must perform the final write in .
5. Recoverability
Recoverability ensures that if a transaction fails, the database can return to a consistent state without affecting committed transactions.
5.1 Recoverable Schedules
A schedule is recoverable if, for every pair of transactions and such that reads a data item previously written by , the commit operation of appears before the commit operation of .
- Rule: If reads from , must commit first.
- Why: If commits before , and then aborts, has committed data based on a value that "never happened." This violates Atomicity.
5.2 Cascadeless Schedules
A schedule involves cascading rollback if the failure of one transaction causes the rollback of dependent transactions.
- Cascadeless Schedule: A schedule where, for every pair of transactions and such that reads a data item previously written by , the read operation of occurs after the commit operation of .
- Benefit: Prevents cascading rollbacks. Every cascadeless schedule is also recoverable.
5.3 Strict Schedules
A schedule is strict if a value written by is neither read nor overwritten by any other transaction until either commits or aborts.
- Relation: Strict Cascadeless Recoverable.
6. Concurrency Control
Concurrency control protocols ensure that concurrent schedules are serializable and recoverable.
6.1 Problems in Concurrency (Without Control)
- Lost Update Problem: Two transactions update the same item; the second update overwrites the first.
- Dirty Read (Temporary Update): Transaction reads a value written by a transaction that later fails.
- Unrepeatable Read: A transaction reads the same item twice and gets different values because another transaction modified it in between.
- Phantom Read: A transaction executes a query returning a set of rows; a second transaction inserts a new row that matches the query conditions, changing the result set for the first transaction.
6.2 Lock-Based Protocols
A lock is a variable associated with a data item that describes its status with respect to operations.
Lock Modes:
- Shared (S): Read-only lock. Multiple transactions can hold an S-lock on the same item.
- Exclusive (X): Read/Write lock. Only one transaction can hold an X-lock.
Two-Phase Locking Protocol (2PL):
This protocol guarantees serializability. A transaction handles locks in two distinct phases:
- Growing Phase: A transaction may obtain locks but may not release any lock.
- Shrinking Phase: A transaction may release locks but may not obtain any new lock.
- Strict 2PL: All Exclusive locks are held until the transaction commits. Prevents cascading rollbacks.
- Rigorous 2PL: All locks (Shared and Exclusive) are held until the transaction commits.
6.3 Timestamp-Based Protocols
Each transaction is issued a unique timestamp when it enters the system.
- Timestamp Ordering (TO): The protocol ensures that conflicting operations are executed in timestamp order.
- Rules:
- If wants to Read(X):
- If , needs a value already overwritten. Reject and rollback.
- Otherwise, execute Read and update .
- If wants to Write(X):
- If , produces a value needed by an "older" read that already happened. Reject.
- If , is writing an obsolete value. Reject.
- If wants to Read(X):
6.4 Deadlock Handling
A deadlock occurs when a set of transactions are stalled because each is waiting for a lock held by another in the set.
Handling Methods:
- Deadlock Prevention: Ensure the system never enters a deadlock state (e.g., via "Wait-Die" or "Wound-Wait" schemes based on timestamps).
- Deadlock Detection and Recovery: Allow deadlocks to occur, detect them periodically, and recover.
- Wait-For Graph: Nodes are transactions. Edge implies is waiting for a lock held by .
- Detection: If the Wait-For Graph has a cycle, there is a deadlock.
- Recovery: Select a victim transaction to roll back to break the cycle. Criteria for victim selection include transaction age, progress, and number of data items used.