Unit 4 - Notes

INT306

Unit 4: Relational Database Design

1. Pitfalls in Relational Database Design

Before understanding normalization, it is crucial to understand the problems (pitfalls) that arise from poor database design. These are primarily caused by redundancy (unnecessary repetition of data).

Database Anomalies

When a database is not properly designed, it suffers from modification anomalies:

  1. Insertion Anomaly:

    • The inability to add data to the database due to the absence of other data.
    • Example: In a table storing (Student_ID, Course_ID, Course_Fee), if a new course is designed but no student has enrolled yet, we cannot insert the Course_Fee if Student_ID is part of the primary key.
  2. Deletion Anomaly:

    • The unintended loss of data due to the deletion of other data.
    • Example: If a student drops a course and we delete the row (Student_ID, Course_ID, Course_Fee), and that was the only student in that course, we lose the information about the Course_Fee entirely.
  3. Update (Modification) Anomaly:

    • Data inconsistency that results from data redundancy and partial update.
    • Example: If a Course_Fee changes and is recorded in 100 rows (for 100 students), failing to update all 100 rows results in inconsistent data.

2. Functional Dependency (FD)

Functional dependency is the foundational concept behind normalization. It describes the relationship between attributes in a relation.

Definition:
Given a relation , attribute is functionally dependent on attribute (denoted as ) if and only if each value of is associated with precisely one value of .

  • Left Hand Side (LHS): Determinant.
  • Right Hand Side (RHS): Dependent.

Types of Functional Dependencies

  1. Trivial FD: is trivial if is a subset of . (e.g., {Student_ID, Name} -> Student_ID).
  2. Non-Trivial FD: is non-trivial if is not a subset of . (e.g., Student_ID -> Name).
  3. Fully Functional Dependency: is fully dependent on if it is dependent on , but not on any proper subset of . (Crucial for 2NF).
  4. Transitive Dependency: If and , then . (Crucial for 3NF).

3. Data Integrity Rules

Data integrity ensures the accuracy and consistency of data.

  1. Entity Integrity Rule:

    • The Primary Key cannot be NULL.
    • The Primary Key must be unique to identify each row uniquely.
  2. Referential Integrity Rule:

    • A Foreign Key value must match an existing Primary Key value in the parent table, or it must be NULL (unless constrained otherwise).
    • Prevents "orphan" records.
  3. Domain Integrity Rule:

    • All values in a column must fall within a defined domain (data type, format, range).

4. Normalization: Concepts and Need

Normalization is the process of organizing data in a database. This involves creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.

Need for Normalization:

  • Minimize Redundancy: Reduces storage space.
  • Eliminate Anomalies: Prevents Insert, Update, and Delete anomalies.
  • Data Consistency: Ensures that a piece of data is stored in only one place.
  • Query Efficiency: While highly normalized tables require joins, they are smaller and easier to index.

5. Normal Forms (1NF to BCNF)

First Normal Form (1NF)

Rule: A relation is in 1NF if it contains only atomic (indivisible) values.

  • No repeating groups.
  • No multi-valued attributes (e.g., multiple phone numbers in one cell).
  • All entries in a column must be of the same data type.
Violation Example: ID Name Courses
1 John Math, Science
Conversion to 1NF:
Split into separate rows.
ID Name Course
1 John Math
1 John Science

Second Normal Form (2NF)

Prerequisites:

  1. The relation must be in 1NF.
  2. There must be no Partial Dependencies.

Rule: All non-key attributes must be fully functionally dependent on the Primary Key.

  • This applies specifically when the Primary Key is a Composite Key (made of two or more columns).
  • If a non-key attribute depends on only part of the composite key, it violates 2NF.

Violation Example:
Schema: (Student_ID, Course_ID, Grade, Course_Name)
Key: (Student_ID, Course_ID)

  • Dependency: (Student_ID, Course_ID) -> Grade (Full dependency)
  • Dependency: Course_ID -> Course_Name (Partial dependency)

Conversion to 2NF:
Decompose into two tables:

  1. Table_Grades (Student_ID, Course_ID, Grade)
  2. Table_Courses (Course_ID, Course_Name)

Third Normal Form (3NF)

Prerequisites:

  1. The relation must be in 2NF.
  2. There must be no Transitive Dependencies.

Rule: No non-key attribute should depend on another non-key attribute.

  • Formally: For every non-trivial functional dependency , either:
    • is a Super Key, OR
    • is a Prime Attribute (part of a candidate key).

Violation Example:
Schema: (Student_ID, Name, Zip_Code, City)
Key: Student_ID

  • Dependency: Student_ID -> Zip_Code
  • Dependency: Zip_Code -> City (Transitive! City depends on Zip, not directly on ID).

Conversion to 3NF:
Decompose into:

  1. Student (Student_ID, Name, Zip_Code)
  2. Location (Zip_Code, City)

Boyce-Codd Normal Form (BCNF)

BCNF is a stricter version of 3NF (often called 3.5 NF). It handles anomalies that 3NF misses, usually involving overlapping candidate keys.

Rule: For every non-trivial functional dependency , must be a Super Key.

Difference from 3NF:
3NF allows if is a prime attribute, even if is not a key. BCNF does not allow this.

Example Scenario:
Table: (Student, Course, Professor)
Constraints:

  • One student takes one course.
  • One professor teaches only one course.
  • Multiple professors can teach the same course topic.
  • FDs: (Student, Course) -> Professor and Professor -> Course.

In this case, Professor -> Course violates BCNF because Professor is not a super key (Professor cannot identify the Student).

Conversion:
Decompose into:

  1. (Professor, Course)
  2. (Student, Professor)

6. Higher Normal Forms (4NF and 5NF)

Multivalued Dependency (MVD)

Denoted as .
MVD occurs when the presence of one or more rows in a table implies the presence of one or more other rows in that same table.

  • It signifies that determines a set of values for , and this set is independent of other attributes in the table.
  • It usually arises when two independent 1:N relationships are mixed in a single table.

Fourth Normal Form (4NF)

Prerequisite: Relation is in BCNF.

Rule: A relation is in 4NF if it has no non-trivial Multivalued Dependencies.

Violation Example:
Table: (Course, Textbooks, Instructor)

  • A course has many textbooks.
  • A course has many instructors.
  • Textbooks and Instructors are independent of each other.
Course Textbook Instructor
Math Algebra 101 Prof. A
Math Calculus I Prof. A
Math Algebra 101 Prof. B
Math Calculus I Prof. B

Here, Course ->> Textbook and Course ->> Instructor. The repetition creates redundancy (Cartesian product of books and instructors).

Conversion to 4NF:
Decompose independent multivalues:

  1. Table_Books (Course, Textbook)
  2. Table_Instructors (Course, Instructor)

Join Dependency (JD)

Join Dependency is a generalization of Multivalued Dependency.
A relation satisfies Join Dependency if can be reconstructed by joining without generating spurious (ghost) tuples or losing information.

Fifth Normal Form (5NF) / Project-Join Normal Form (PJNF)

Prerequisite: Relation is in 4NF.

Rule: A relation is in 5NF if every Join Dependency in it is implied by the candidate keys.

  • This deals with cases where information can be reconstructed from smaller pieces, but those pieces are not based on MVDs.
  • It specifically addresses Cyclic Dependencies.

Example Scenario:
Consider three entities: Supplier, Product, Consumer.
Rules:

  1. A Supplier sells a Product.
  2. A Consumer buys a Product.
  3. A Supplier deals with a Consumer.

If we store (Supplier, Product, Consumer) in one table, we might encounter a scenario where a valid combination is not explicitly stored but implied. If we cannot decompose the table into two tables without losing information, but we can decompose it into three (Supplier, Product), (Product, Consumer), (Supplier, Consumer) and rejoin them perfectly, the original table violates 5NF.

Conversion:
Decompose into the three constituent binary relationship tables.


Summary of Normal Forms

Normal Form Key Requirement Issue Addressed
1NF Atomic values Repeating groups
2NF Full Functional Dependency Partial dependency
3NF No Transitive Dependency Transitive dependency
BCNF Determinants must be Super Keys Anomalies with overlapping keys
4NF No Multivalued Dependency Independent 1:N relationships
5NF Join Dependency implied by Keys Cyclic dependencies