Unit 4 - Notes

INT306 9 min read

Unit 4: Relational Database Design

Data Integrity Rules

Data integrity refers to the overall completeness, accuracy, and consistency of data in a relational database. To maintain this, databases enforce specific rules known as integrity constraints. These rules ensure that changes made to the database by authorized users do not result in a loss of data consistency.

There are four primary types of data integrity rules:

1. Domain Integrity

  • Definition: Ensures that all data items in a column fall within a defined set of valid values.
  • Enforcement: Achieved through data types (e.g., integer, varchar, date), constraints (NOT NULL, CHECK, DEFAULT), and permitted value ranges.
  • Example: A column Age must be an integer, and a CHECK (Age > 0) constraint ensures no negative values are entered.

2. Entity Integrity

  • Definition: Ensures that each row in a table is uniquely identifiable.
  • Rule: The primary key of a table cannot be null, and it must be unique. If a primary key were null, the database would lose the ability to uniquely identify that specific record.
  • Example: In a Student table, the StudentID must have a unique value for every row and cannot be left blank.

3. Referential Integrity

  • Definition: Ensures that relationships between tables remain consistent.
  • Rule: A foreign key in one table must either match a valid primary key value in another table or be entirely null. This prevents "orphan records."
  • Example: If an Employee table has a DepartmentID foreign key linking to a Department table, you cannot insert an employee with a DepartmentID that does not exist in the Department table.

4. User-Defined Integrity

  • Definition: Specific business rules that do not fall into the categories of domain, entity, or referential integrity.
  • Enforcement: Typically handled through triggers, stored procedures, and complex assertions.
  • Example: "A customer's total outstanding credit cannot exceed $10,000."

Functional Dependency

Functional dependency (FD) is a relationship that exists when one attribute (or a set of attributes) uniquely determines another attribute (or set of attributes) within a relation. It is the cornerstone of relational database normalization.

Definition and Notation

If X and Y are sets of attributes in a relation R, then Y is functionally dependent on X (denoted as X → Y) if, for every valid instance of R, each value of X uniquely determines exactly one value of Y.

  • X is called the Determinant.
  • Y is called the Dependent.

Example:
In a table Employee(EmpID, EmpName, Salary):

  • EmpID → EmpName (Because knowing the EmpID allows you to uniquely identify the EmpName).

Types of Functional Dependencies

  1. Trivial Functional Dependency: is trivial if Y is a subset of X. (e.g., {EmpID, EmpName} → EmpID).
  2. Non-Trivial Functional Dependency: is non-trivial if Y is not a subset of X. (e.g., EmpID → EmpName).
  3. Fully Functional Dependency: If , and removing any attribute from X means the dependency no longer holds.

Armstrong's Axioms

These are inference rules used to deduce all possible functional dependencies in a database:

  1. Reflexivity: If , then .
  2. Augmentation: If , then for any .
  3. Transitivity: If and , then .

Need of Normalization

Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. Unnormalized databases suffer from anomalies that make data management difficult and prone to errors.

Anomalies Addressed by Normalization

  1. Insertion Anomaly: Occurs when certain attributes cannot be inserted into the database without the presence of other attributes.
    • Example: If a table stores (StudentID, CourseID, ProfessorName), you cannot add a new professor who hasn't been assigned a course yet, unless StudentID and CourseID are allowed to be NULL.
  2. Update Anomaly: Occurs when duplicate data is updated in one place but not in all other places, leading to inconsistent data.
    • Example: If a student's address is stored multiple times for each course they take, updating their address requires updating multiple rows. If one row is missed, the data becomes inconsistent.
  3. Deletion Anomaly: Occurs when deleting a row of data inadvertently causes the loss of other unrelated data.
    • Example: If a student drops the only course a specific professor teaches, deleting the student's record might inadvertently delete the professor's details from the database.

Objectives of Normalization

  • Eliminate redundant data (storing the same data in more than one table).
  • Ensure data dependencies make logical sense (storing related data together).
  • Minimize the use of null values.
  • Prevent insertion, update, and deletion anomalies.

First Normal Form (1NF)

A relation is in First Normal Form if it meets the following rules:

  1. Atomicity: Each attribute must contain only atomic (indivisible) values. No multi-valued attributes or repeating groups are allowed.
  2. Each column must have a unique name.
  3. The order in which data is stored does not matter.

Example

Unnormalized Table:

StudentID StudentName Courses
101 Alice Math, Physics, Chem
102 Bob Biology, Chemistry

The Courses column contains multiple values, violating 1NF.

1NF Compliant Table:

StudentID StudentName Course
101 Alice Math
101 Alice Physics
101 Alice Chem
102 Bob Biology
102 Bob Chemistry

Second Normal Form (2NF)

A relation is in Second Normal Form if:

  1. It is already in 1NF.
  2. It has no partial dependencies.

Partial Dependency: Occurs when a non-prime attribute (an attribute not part of any candidate key) is dependent on only a part of a composite primary key, rather than the entire key.
(Note: If a table has a single-attribute primary key and is in 1NF, it is automatically in 2NF).

Example

1NF Table (Primary Key = {StudentID, CourseID}):

StudentID CourseID Professor StudentName
101 C1 Dr. Smith Alice
101 C2 Dr. Jones Alice
  • Candidate Key: {StudentID, CourseID}
  • Dependency: StudentID → StudentName
  • Violation: StudentName depends only on StudentID (part of the primary key), not on CourseID. This is a partial dependency.

2NF Compliant Tables:
Decompose into two tables:

Table 1: Student (Primary Key: StudentID) StudentID StudentName
101 Alice
Table 2: Enrollment (Primary Key: {StudentID, CourseID}) StudentID CourseID Professor
101 C1 Dr. Smith
101 C2 Dr. Jones

Third Normal Form (3NF)

A relation is in Third Normal Form if:

  1. It is in 2NF.
  2. It has no transitive dependencies.

Transitive Dependency: Occurs when a non-prime attribute functionally determines another non-prime attribute. (i.e., if A → B and B → C, then A → C).

Formally, a relation is in 3NF if for every non-trivial functional dependency , at least one of the following holds true:

  • is a superkey.
  • is a prime attribute (part of a candidate key).

Example

2NF Table (Primary Key: EmpID):

EmpID EmpName DeptID DeptLocation
1 John D1 New York
2 Jane D2 London
  • Dependency: EmpID → DeptID and DeptID → DeptLocation.
  • Violation: DeptLocation is dependent on DeptID, which is a non-prime attribute. This creates a transitive dependency (EmpID → DeptLocation).

3NF Compliant Tables:

Table 1: Employee EmpID EmpName DeptID
1 John D1
2 Jane D2
Table 2: Department DeptID DeptLocation
D1 New York
D2 London

Boyce-Codd Normal Form (BCNF)

BCNF is a stricter version of 3NF, often referred to as 3.5NF. A relation is in BCNF if:

  1. It is in 3NF.
  2. For every non-trivial functional dependency , must be a superkey.

The 3NF rule allows to be a prime attribute even if is not a superkey. BCNF removes this leniency. BCNF deals with anomalies that can occur when a table has multiple overlapping candidate keys.

Example

Table: Booking (Primary Key: {StudentID, Subject}):

StudentID Subject Professor
101 Math Dr. A
102 Math Dr. B
101 Physics Dr. C

Rules for this database:

  • A student can have multiple subjects.

  • A subject can be taught by multiple professors.

  • A student is taught a specific subject by only one professor.

  • A professor teaches only one subject.

  • Candidate Keys: {StudentID, Subject} and {StudentID, Professor}

  • Dependency: Professor → Subject (Since a professor teaches only one subject).

  • 3NF Check: It is in 3NF because in Professor → Subject, Subject is a prime attribute.

  • BCNF Check: It is NOT in BCNF because Professor is a determinant but not a superkey.

BCNF Compliant Tables:

Table 1: Student_Prof (Primary Key: {StudentID, Professor}) StudentID Professor
101 Dr. A
102 Dr. B
Table 2: Prof_Subject (Primary Key: Professor) Professor Subject
Dr. A Math
Dr. B Math

Multivalued Dependencies (MVD)

A Multivalued Dependency exists when there are at least three attributes (e.g., A, B, C) in a relation, and for a single value of A, there are multiple values of B, and multiple values of C, but B and C are completely independent of each other.

  • Notation: (Read as "X multidetermines Y").
  • Condition: A relation has an MVD if, for a given value of , the set of values for is completely independent of the set of values for .

MVDs cause a massive multiplication of data rows (Cartesian product) to maintain data consistency.

Example Scenario

A Restaurant offers different Cuisines and provides different DeliveryAreas.
Cuisines and Delivery Areas are completely independent of each other. If a restaurant offers 3 cuisines and delivers to 3 areas, the database must store rows just to represent this accurately, causing extreme redundancy.


Fourth Normal Form (4NF)

A relation is in Fourth Normal Form if:

  1. It is in BCNF.
  2. It has no non-trivial multivalued dependencies.

Formally, for every non-trivial multivalued dependency , must be a superkey. If a table contains two or more independent multivalued attributes, it violates 4NF and must be decomposed.

Example

Table with MVD (Primary Key: {Restaurant, Cuisine, DeliveryArea}):

Restaurant Cuisine DeliveryArea
PizzaHut Italian North
PizzaHut Italian South
PizzaHut Vegan North
PizzaHut Vegan South
  • Dependencies: Restaurant ↠ Cuisine and Restaurant ↠ DeliveryArea.
  • Violation: Cuisine and DeliveryArea are independent of each other but depend on Restaurant. This is a non-trivial MVD, and Restaurant is not a superkey on its own.

4NF Compliant Tables:
Decompose the table to isolate the independent multivalued attributes.

Table 1: Restaurant_Cuisine Restaurant Cuisine
PizzaHut Italian
PizzaHut Vegan
Table 2: Restaurant_Delivery Restaurant DeliveryArea
PizzaHut North
PizzaHut South