Unit 4 - Notes
Unit 4: Relational Database Design
Relational Database Design is the process of organizing data to minimize redundancy and dependency. The goal is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via defined relationships.
1. Data Integrity Rules
Data integrity ensures the accuracy, consistency, and reliability of data in the database. In the relational model, there are three primary types of integrity rules.
A. Entity Integrity
- Definition: No part of a Primary Key can be
NULL. - Reasoning: The primary key is used to uniquely identify a specific row in a table. If the primary key were null, the system could not identify the row, leading to ambiguity.
- Example: In a
Studenttable withStudentIDas the PK, no student entry can have a missing ID.
B. Referential Integrity
- Definition: If a Foreign Key exists in one table, the value of that key must match a Primary Key value in the referenced table, or it must be
NULL(if allowed). - Reasoning: This ensures consistency between tables. You cannot refer to a record that does not exist.
- Example: If
Enrollmenttable has aCourseIDforeign key pointing to theCoursetable, you cannot enter aCourseIDinEnrollmentthat doesn't exist inCourse.
C. Domain Integrity
- Definition: All entries in a specific column must be of the same data type and fall within a defined set of valid values (domain).
- Example: A column
GPAmust be a numeric value between 0.0 and 4.0. It cannot contain text like "Four point zero".
2. Functional Dependency (FD)
Functional dependency is a constraint between two sets of attributes in a relation from a database. It is the fundamental concept underlying normalization.
Definition
Given a relation , attribute set is functionally dependent on attribute set (denoted as ) if each value of determines exactly one value of .
- Determinant (): The attribute(s) on the left side.
- Dependent (): The attribute(s) on the right side.
Types of Functional Dependencies
- Trivial FD: is trivial if is a subset of .
- Example: .
- Non-Trivial FD: is non-trivial if is not a subset of .
- Example: .
- Fully Functional Dependency: is fully dependent on if it is dependent on and not on any proper subset of . (Crucial for 2NF).

3. The Need for Normalization
Normalization is the process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity.
Database Anomalies
Without normalization, databases suffer from anomalies which cause data inconsistency:
- Update Anomaly: If data is redundant (stored in multiple places), updating it in one place requires updating it everywhere. Failure to do so leads to inconsistent data.
- Insertion Anomaly: Being unable to insert valid data because other data is missing.
- Example: Cannot add a new professor if they aren't assigned to a course yet (if the table mixes Course and Professor details).
- Deletion Anomaly: Deleting a record causes unintended loss of other data.
- Example: Deleting a course might inadvertently delete the only record of the professor teaching it.
4. Normal Forms
Normalization is performed in stages known as Normal Forms (NF).

First Normal Form (1NF)
Rule: A relation is in 1NF if:
- All attributes contain atomic (indivisible) values.
- There are no repeating groups or arrays.
- Violation: A cell containing "Phone Numbers: 555-1234, 555-9876".
- Solution: Create a separate row for each phone number or a separate table.
Second Normal Form (2NF)
Rule: A relation is in 2NF if:
- It is already in 1NF.
- It has No Partial Dependency.
Partial Dependency: Occurs when a non-prime attribute is dependent on only part of a composite candidate key.
- Example Scenario:
- Table:
Score(StudentID, CourseID, StudentName, Marks) - Key:
{StudentID, CourseID} - Issue:
StudentNamedepends only onStudentID, notCourseID.
- Table:
- Solution: Decompose into two tables:
Student(StudentID, StudentName)Score(StudentID, CourseID, Marks)
Third Normal Form (3NF)
Rule: A relation is in 3NF if:
- It is already in 2NF.
- It has No Transitive Dependency.
Transitive Dependency: Occurs when a non-prime attribute depends on another non-prime attribute.
- Formula: If and , then is a transitive dependency.
- Example Scenario:
- Table:
Exam(ExamID, TotalMarks, Grade) - Dependency:
ExamIDdeterminesTotalMarks, butTotalMarksdeterminesGrade. - Issue:
Gradedepends onExamIDtransitively viaTotalMarks.
- Table:
- Solution: Decompose into:
Exam_Marks(ExamID, TotalMarks)Grade_Scheme(TotalMarks, Grade)

Boyce-Codd Normal Form (BCNF)
BCNF is a stricter version of 3NF. It deals with anomalies that 3NF cannot handle, usually involving overlapping candidate keys.
Rule: A relation is in BCNF if for every non-trivial functional dependency , is a Super Key.
- Difference from 3NF: 3NF allows if is a super key OR if is a prime attribute (part of a candidate key). BCNF only allows if is a super key.
- Implication: In BCNF, the left side of every arrow (determinant) must be a candidate key.
5. Higher Normal Forms
Multivalued Dependencies (MVD)
A multivalued dependency occurs when one attribute determines a set of values for another attribute, independently of other attributes in the table.
Notation: ( multidetermines ).
- Scenario: Consider a table
Course_Info(Course, Book, Lecturer).- One course has many recommended books.
- One course has many lecturers.
- Books and Lecturers are independent of each other.
- This creates redundancy because every book must be listed for every lecturer for a specific course.
Fourth Normal Form (4NF)
Rule: A relation is in 4NF if:
- It is in BCNF.
- It contains No Multivalued Dependencies.
If a relation has a multivalued dependency , it must be decomposed so that the MVDs are separated.
Example of 4NF Decomposition:
Instead of one table:
Course_Info(Course, Book, Lecturer)
Create two tables:
Course_Books(Course, Book)Course_Lecturers(Course, Lecturer)
This eliminates the Cartesian product redundancy created by storing independent multivalued facts in a single table.
