Unit 4 - Notes

INT306 6 min read

Unit 4: Relational Database Design

Relational Database Design is the process of organizing data to minimize redundancy and dependency. The goal is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via defined relationships.


1. Data Integrity Rules

Data integrity ensures the accuracy, consistency, and reliability of data in the database. In the relational model, there are three primary types of integrity rules.

A. Entity Integrity

  • Definition: No part of a Primary Key can be NULL.
  • Reasoning: The primary key is used to uniquely identify a specific row in a table. If the primary key were null, the system could not identify the row, leading to ambiguity.
  • Example: In a Student table with StudentID as the PK, no student entry can have a missing ID.

B. Referential Integrity

  • Definition: If a Foreign Key exists in one table, the value of that key must match a Primary Key value in the referenced table, or it must be NULL (if allowed).
  • Reasoning: This ensures consistency between tables. You cannot refer to a record that does not exist.
  • Example: If Enrollment table has a CourseID foreign key pointing to the Course table, you cannot enter a CourseID in Enrollment that doesn't exist in Course.

C. Domain Integrity

  • Definition: All entries in a specific column must be of the same data type and fall within a defined set of valid values (domain).
  • Example: A column GPA must be a numeric value between 0.0 and 4.0. It cannot contain text like "Four point zero".

2. Functional Dependency (FD)

Functional dependency is a constraint between two sets of attributes in a relation from a database. It is the fundamental concept underlying normalization.

Definition

Given a relation , attribute set is functionally dependent on attribute set (denoted as ) if each value of determines exactly one value of .

  • Determinant (): The attribute(s) on the left side.
  • Dependent (): The attribute(s) on the right side.

Types of Functional Dependencies

  1. Trivial FD: is trivial if is a subset of .
    • Example: .
  2. Non-Trivial FD: is non-trivial if is not a subset of .
    • Example: .
  3. Fully Functional Dependency: is fully dependent on if it is dependent on and not on any proper subset of . (Crucial for 2NF).

A conceptual diagram illustrating Functional Dependency. The image should feature a database table a...
AI-generated image — may contain inaccuracies


3. The Need for Normalization

Normalization is the process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity.

Database Anomalies

Without normalization, databases suffer from anomalies which cause data inconsistency:

  1. Update Anomaly: If data is redundant (stored in multiple places), updating it in one place requires updating it everywhere. Failure to do so leads to inconsistent data.
  2. Insertion Anomaly: Being unable to insert valid data because other data is missing.
    • Example: Cannot add a new professor if they aren't assigned to a course yet (if the table mixes Course and Professor details).
  3. Deletion Anomaly: Deleting a record causes unintended loss of other data.
    • Example: Deleting a course might inadvertently delete the only record of the professor teaching it.

4. Normal Forms

Normalization is performed in stages known as Normal Forms (NF).

A nested Venn diagram or concentric circles illustrating the hierarchy of Normal Forms. The largest ...
AI-generated image — may contain inaccuracies

First Normal Form (1NF)

Rule: A relation is in 1NF if:

  1. All attributes contain atomic (indivisible) values.
  2. There are no repeating groups or arrays.
  • Violation: A cell containing "Phone Numbers: 555-1234, 555-9876".
  • Solution: Create a separate row for each phone number or a separate table.

Second Normal Form (2NF)

Rule: A relation is in 2NF if:

  1. It is already in 1NF.
  2. It has No Partial Dependency.

Partial Dependency: Occurs when a non-prime attribute is dependent on only part of a composite candidate key.

  • Example Scenario:
    • Table: Score(StudentID, CourseID, StudentName, Marks)
    • Key: {StudentID, CourseID}
    • Issue: StudentName depends only on StudentID, not CourseID.
  • Solution: Decompose into two tables:
    1. Student(StudentID, StudentName)
    2. Score(StudentID, CourseID, Marks)

Third Normal Form (3NF)

Rule: A relation is in 3NF if:

  1. It is already in 2NF.
  2. It has No Transitive Dependency.

Transitive Dependency: Occurs when a non-prime attribute depends on another non-prime attribute.

  • Formula: If and , then is a transitive dependency.
  • Example Scenario:
    • Table: Exam(ExamID, TotalMarks, Grade)
    • Dependency: ExamID determines TotalMarks, but TotalMarks determines Grade.
    • Issue: Grade depends on ExamID transitively via TotalMarks.
  • Solution: Decompose into:
    1. Exam_Marks(ExamID, TotalMarks)
    2. Grade_Scheme(TotalMarks, Grade)

A flowchart diagram visualizing the decomposition process from 1NF to 3NF. The top level shows a "Un...
AI-generated image — may contain inaccuracies

Boyce-Codd Normal Form (BCNF)

BCNF is a stricter version of 3NF. It deals with anomalies that 3NF cannot handle, usually involving overlapping candidate keys.

Rule: A relation is in BCNF if for every non-trivial functional dependency , is a Super Key.

  • Difference from 3NF: 3NF allows if is a super key OR if is a prime attribute (part of a candidate key). BCNF only allows if is a super key.
  • Implication: In BCNF, the left side of every arrow (determinant) must be a candidate key.

5. Higher Normal Forms

Multivalued Dependencies (MVD)

A multivalued dependency occurs when one attribute determines a set of values for another attribute, independently of other attributes in the table.
Notation: ( multidetermines ).

  • Scenario: Consider a table Course_Info(Course, Book, Lecturer).
    • One course has many recommended books.
    • One course has many lecturers.
    • Books and Lecturers are independent of each other.
    • This creates redundancy because every book must be listed for every lecturer for a specific course.

Fourth Normal Form (4NF)

Rule: A relation is in 4NF if:

  1. It is in BCNF.
  2. It contains No Multivalued Dependencies.

If a relation has a multivalued dependency , it must be decomposed so that the MVDs are separated.

Example of 4NF Decomposition:
Instead of one table:
Course_Info(Course, Book, Lecturer)

Create two tables:

  1. Course_Books(Course, Book)
  2. Course_Lecturers(Course, Lecturer)

This eliminates the Cartesian product redundancy created by storing independent multivalued facts in a single table.

A diagram illustrating Multivalued Dependency and 4NF resolution. The left side shows a single table...
AI-generated image — may contain inaccuracies