Unit 3 - Notes

CSE357 7 min read

Unit 3: Database Management Systems (DBMS)

1. Introduction to Databases and RDBMS

1.1 Fundamental Concepts

  • Data: Raw facts and figures (e.g., "John", 25, "A").
  • Information: Processed data that conveys meaning (e.g., "John is 25 years old and has a grade of A").
  • Database: An organized collection of structured information, or data, typically stored electronically in a computer system.
  • DBMS (Database Management System): Software that interacts with end-users, applications, and the database itself to capture and analyze the data. Examples: MySQL, PostgreSQL, Oracle Database, Microsoft SQL Server.

1.2 File Processing System vs. DBMS

Before DBMS, data was stored in flat files. DBMS resolves the limitations of file systems:

  • Data Redundancy: File systems often duplicate data in multiple files. DBMS creates a central repository, minimizing duplication.
  • Data Inconsistency: In file systems, if data is changed in one file but not another, it becomes inconsistent. DBMS ensures changes are propagated.
  • Data Isolation: DBMS allows easy retrieval of data from multiple tables; file systems make cross-referencing difficult.
  • Security: DBMS provides granular access control (users/roles).

1.3 Relational Database Management System (RDBMS)

An RDBMS is a DBMS based on the Relational Model introduced by E.F. Codd.

  • Structure: Data is stored in tables (relations).
  • Relationships: Logical connections between tables are established via keys.
  • Standardization: Uses SQL (Structured Query Language) for data access.

2. Data Definitions: Tables, Fields, Records

In the Relational Model, specific terminologies correspond to physical storage concepts:

Relational Term Common Term Definition
Relation Table A two-dimensional structure composed of rows and columns.
Tuple Record / Row A single entry in a table representing a specific entity instance.
Attribute Field / Column A characteristic or property of the entity (e.g., Name, ID).
Domain Data Type The set of permitted values for an attribute (e.g., Integer, Varchar).

2.1 The Schema

The logical structure of the database is called the Schema. It defines the tables, the fields in each table, and the relationships between them.

  • Instance: The actual content of the database at a specific point in time.

3. SQL and Data Manipulation

SQL (Structured Query Language) is the standard language for interacting with RDBMS. It is categorized into sub-languages:

3.1 Data Definition Language (DDL)

Used to define or modify the structure (schema) of the database.

  • CREATE: Creates a new table or database.
  • ALTER: Modifies an existing database structure (add/remove columns).
  • DROP: Deletes a table or database.
  • TRUNCATE: Removes all records from a table but keeps the structure.

Example:

SQL
CREATE TABLE Students (
    StudentID INT PRIMARY KEY,
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    EnrollmentDate DATE
);

3.2 Data Manipulation Language (DML)

Used for managing data within schema objects.

  • INSERT: Adds new rows.
  • UPDATE: Modifies existing data.
  • DELETE: Removes rows.
  • SELECT: Retrieves data (often categorized separately as DQL - Data Query Language).

Examples:

SQL
-- Inserting Data
INSERT INTO Students (StudentID, FirstName, LastName) VALUES (101, 'Alice', 'Smith');

-- Updating Data
UPDATE Students SET LastName = 'Johnson' WHERE StudentID = 101;

-- Deleting Data
DELETE FROM Students WHERE StudentID = 101;

3.3 The SELECT Statement

The most frequently used command to query data.

  • Clause Order: SELECT -> FROM -> WHERE -> GROUP BY -> HAVING -> ORDER BY.

SQL
SELECT FirstName, LastName 
FROM Students 
WHERE EnrollmentDate > '2023-01-01' 
ORDER BY LastName ASC;


4. Database Keys and Data Integrity

Keys are used to establish and identify relationships between tables and to uniquely identify any record within a table.

4.1 Types of Keys

  1. Candidate Key: A minimal set of attributes that can uniquely identify a tuple. A table can have multiple candidate keys.
  2. Primary Key (PK): The specific candidate key chosen to uniquely identify rows in a table.
    • Constraint: Cannot contain NULL values and must be unique.
  3. Super Key: Any set of attributes that uniquely identifies a row. (A candidate key is a minimal super key).
  4. Foreign Key (FK): An attribute in one table that links to the Primary Key of another table. It enforces referential integrity.
  5. Composite Key: A key that consists of two or more attributes to create a unique identifier.

4.2 Data Integrity Constraints

Rules ensuring the accuracy and consistency of data.

  • Entity Integrity: Enforced by the Primary Key. No row can have a NULL value for the primary key.
  • Referential Integrity: Enforced by the Foreign Key. A value in the FK column must either match a value in the referenced PK column or be NULL. This prevents "orphaned" records.
  • Domain Integrity: Ensures entries in a column follow the defined data type and format (e.g., Age cannot be "XYZ").

5. Database Normalization

Normalization is the process of organizing data to minimize redundancy and dependency (preventing anomalies).

5.1 Anomalies

Without normalization, three types of anomalies occur:

  1. Insertion Anomaly: Unable to add data because distinct data is missing (e.g., cannot add a student without a course if the table requires both).
  2. Deletion Anomaly: Deleting data unintentionally causes the loss of other valid data.
  3. Update Anomaly: Updating a piece of data requires multiple updates across the database; failure to do so results in inconsistency.

5.2 Normal Forms (NF)

First Normal Form (1NF)

  • Rule: Every column must hold atomic (indivisible) values. No repeating groups or arrays.
  • Solution: Create a new row for each value or separate tables.

Second Normal Form (2NF)

  • Rule: Must be in 1NF AND have no Partial Dependency.
  • Partial Dependency: When a non-prime attribute depends on only a part of a composite primary key.
  • Solution: Move the partially dependent data to a separate table.

Third Normal Form (3NF)

  • Rule: Must be in 2NF AND have no Transitive Dependency.
  • Transitive Dependency: When a non-key attribute depends on another non-key attribute (e.g., and , therefore ).
  • Solution: Move the transitive data to a new table.

Boyce-Codd Normal Form (BCNF)

  • A stricter version of 3NF.
  • Rule: For every functional dependency , must be a Super Key.

6. Transactions and Transaction Management

6.1 Transaction Definition

A Transaction is a logical unit of work that contains one or more SQL statements. A transaction is atomic: it either happens completely or not at all.

Example: Transferring money from Account A to Account B.

  1. Read A balance.
  2. Deduct $100 from A.
  3. Read B balance.
  4. Add $100 to B.
    If step 3 fails, steps 1 and 2 must be rolled back.

6.2 ACID Properties

To ensure data integrity, transactions must adhere to ACID properties:

  1. Atomicity: The "All or Nothing" rule. If any part of the transaction fails, the entire transaction is aborted and the database remains unchanged.
  2. Consistency: The database must move from one valid state to another valid state. Constraints (like integrity rules) must be maintained.
  3. Isolation: Multiple transactions occurring simultaneously must not interfere with each other. Intermediate states of a transaction are invisible to other transactions.
  4. Durability: Once a transaction is committed, the changes are permanent, even in the event of a system failure.

6.3 Transaction States

  1. Active: The initial state; the transaction is executing.
  2. Partially Committed: Final statement executed, but not yet saved to stable storage.
  3. Failed: Normal execution cannot proceed (error or crash).
  4. Aborted: The transaction is rolled back, and the database is restored to the state before the transaction started.
  5. Committed: The transaction is successfully completed and changes are permanent.

6.4 Transaction Control Language (TCL)

Commands used to manage transactions:

  • COMMIT: Saves all changes made during the transaction.
  • ROLLBACK: Undoes changes if an error occurs.
  • SAVEPOINT: Sets a point within a transaction to which one can rollback without aborting the whole transaction.

6.5 Concurrency Control

When multiple transactions execute simultaneously, conflicts can arise (e.g., Lost Update, Dirty Read). DBMS uses Locking to manage this.

  • Shared Lock (Read Lock): Multiple transactions can read the data, but none can write.
  • Exclusive Lock (Write Lock): Only one transaction can hold this; it allows reading and writing. No other transaction can access the data until the lock is released.
  • Deadlock: A situation where two transactions are waiting for each other to release locks, causing a standstill. DBMS must detect and resolve deadlocks (usually by aborting one transaction).