Unit 4 - Notes
Unit 4: Relational Database Design
Data Integrity Rules
Data integrity refers to the overall completeness, accuracy, and consistency of data in a relational database. To maintain this, databases enforce specific rules known as integrity constraints. These rules ensure that changes made to the database by authorized users do not result in a loss of data consistency.
There are four primary types of data integrity rules:
1. Domain Integrity
- Definition: Ensures that all data items in a column fall within a defined set of valid values.
- Enforcement: Achieved through data types (e.g., integer, varchar, date), constraints (
NOT NULL,CHECK,DEFAULT), and permitted value ranges. - Example: A column
Agemust be an integer, and aCHECK (Age > 0)constraint ensures no negative values are entered.
2. Entity Integrity
- Definition: Ensures that each row in a table is uniquely identifiable.
- Rule: The primary key of a table cannot be null, and it must be unique. If a primary key were null, the database would lose the ability to uniquely identify that specific record.
- Example: In a
Studenttable, theStudentIDmust have a unique value for every row and cannot be left blank.
3. Referential Integrity
- Definition: Ensures that relationships between tables remain consistent.
- Rule: A foreign key in one table must either match a valid primary key value in another table or be entirely null. This prevents "orphan records."
- Example: If an
Employeetable has aDepartmentIDforeign key linking to aDepartmenttable, you cannot insert an employee with aDepartmentIDthat does not exist in theDepartmenttable.
4. User-Defined Integrity
- Definition: Specific business rules that do not fall into the categories of domain, entity, or referential integrity.
- Enforcement: Typically handled through triggers, stored procedures, and complex assertions.
- Example: "A customer's total outstanding credit cannot exceed $10,000."
Functional Dependency
Functional dependency (FD) is a relationship that exists when one attribute (or a set of attributes) uniquely determines another attribute (or set of attributes) within a relation. It is the cornerstone of relational database normalization.
Definition and Notation
If X and Y are sets of attributes in a relation R, then Y is functionally dependent on X (denoted as X → Y) if, for every valid instance of R, each value of X uniquely determines exactly one value of Y.
- X is called the Determinant.
- Y is called the Dependent.
Example:
In a table Employee(EmpID, EmpName, Salary):
EmpID → EmpName(Because knowing the EmpID allows you to uniquely identify the EmpName).
Types of Functional Dependencies
- Trivial Functional Dependency: is trivial if Y is a subset of X. (e.g.,
{EmpID, EmpName} → EmpID). - Non-Trivial Functional Dependency: is non-trivial if Y is not a subset of X. (e.g.,
EmpID → EmpName). - Fully Functional Dependency: If , and removing any attribute from X means the dependency no longer holds.
Armstrong's Axioms
These are inference rules used to deduce all possible functional dependencies in a database:
- Reflexivity: If , then .
- Augmentation: If , then for any .
- Transitivity: If and , then .
Need of Normalization
Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. Unnormalized databases suffer from anomalies that make data management difficult and prone to errors.
Anomalies Addressed by Normalization
- Insertion Anomaly: Occurs when certain attributes cannot be inserted into the database without the presence of other attributes.
- Example: If a table stores
(StudentID, CourseID, ProfessorName), you cannot add a new professor who hasn't been assigned a course yet, unlessStudentIDandCourseIDare allowed to be NULL.
- Example: If a table stores
- Update Anomaly: Occurs when duplicate data is updated in one place but not in all other places, leading to inconsistent data.
- Example: If a student's address is stored multiple times for each course they take, updating their address requires updating multiple rows. If one row is missed, the data becomes inconsistent.
- Deletion Anomaly: Occurs when deleting a row of data inadvertently causes the loss of other unrelated data.
- Example: If a student drops the only course a specific professor teaches, deleting the student's record might inadvertently delete the professor's details from the database.
Objectives of Normalization
- Eliminate redundant data (storing the same data in more than one table).
- Ensure data dependencies make logical sense (storing related data together).
- Minimize the use of null values.
- Prevent insertion, update, and deletion anomalies.
First Normal Form (1NF)
A relation is in First Normal Form if it meets the following rules:
- Atomicity: Each attribute must contain only atomic (indivisible) values. No multi-valued attributes or repeating groups are allowed.
- Each column must have a unique name.
- The order in which data is stored does not matter.
Example
Unnormalized Table:
| StudentID | StudentName | Courses |
|---|---|---|
| 101 | Alice | Math, Physics, Chem |
| 102 | Bob | Biology, Chemistry |
The Courses column contains multiple values, violating 1NF.
1NF Compliant Table:
| StudentID | StudentName | Course |
|---|---|---|
| 101 | Alice | Math |
| 101 | Alice | Physics |
| 101 | Alice | Chem |
| 102 | Bob | Biology |
| 102 | Bob | Chemistry |
Second Normal Form (2NF)
A relation is in Second Normal Form if:
- It is already in 1NF.
- It has no partial dependencies.
Partial Dependency: Occurs when a non-prime attribute (an attribute not part of any candidate key) is dependent on only a part of a composite primary key, rather than the entire key.
(Note: If a table has a single-attribute primary key and is in 1NF, it is automatically in 2NF).
Example
1NF Table (Primary Key = {StudentID, CourseID}):
| StudentID | CourseID | Professor | StudentName |
|---|---|---|---|
| 101 | C1 | Dr. Smith | Alice |
| 101 | C2 | Dr. Jones | Alice |
- Candidate Key:
{StudentID, CourseID} - Dependency:
StudentID → StudentName - Violation:
StudentNamedepends only onStudentID(part of the primary key), not onCourseID. This is a partial dependency.
2NF Compliant Tables:
Decompose into two tables:
Table 1: Student (Primary Key: StudentID) |
StudentID | StudentName |
|---|---|---|
| 101 | Alice |
Table 2: Enrollment (Primary Key: {StudentID, CourseID}) |
StudentID | CourseID | Professor |
|---|---|---|---|
| 101 | C1 | Dr. Smith | |
| 101 | C2 | Dr. Jones |
Third Normal Form (3NF)
A relation is in Third Normal Form if:
- It is in 2NF.
- It has no transitive dependencies.
Transitive Dependency: Occurs when a non-prime attribute functionally determines another non-prime attribute. (i.e., if A → B and B → C, then A → C).
Formally, a relation is in 3NF if for every non-trivial functional dependency , at least one of the following holds true:
- is a superkey.
- is a prime attribute (part of a candidate key).
Example
2NF Table (Primary Key: EmpID):
| EmpID | EmpName | DeptID | DeptLocation |
|---|---|---|---|
| 1 | John | D1 | New York |
| 2 | Jane | D2 | London |
- Dependency:
EmpID → DeptIDandDeptID → DeptLocation. - Violation:
DeptLocationis dependent onDeptID, which is a non-prime attribute. This creates a transitive dependency (EmpID → DeptLocation).
3NF Compliant Tables:
| Table 1: Employee | EmpID | EmpName | DeptID |
|---|---|---|---|
| 1 | John | D1 | |
| 2 | Jane | D2 |
| Table 2: Department | DeptID | DeptLocation |
|---|---|---|
| D1 | New York | |
| D2 | London |
Boyce-Codd Normal Form (BCNF)
BCNF is a stricter version of 3NF, often referred to as 3.5NF. A relation is in BCNF if:
- It is in 3NF.
- For every non-trivial functional dependency , must be a superkey.
The 3NF rule allows to be a prime attribute even if is not a superkey. BCNF removes this leniency. BCNF deals with anomalies that can occur when a table has multiple overlapping candidate keys.
Example
Table: Booking (Primary Key: {StudentID, Subject}):
| StudentID | Subject | Professor |
|---|---|---|
| 101 | Math | Dr. A |
| 102 | Math | Dr. B |
| 101 | Physics | Dr. C |
Rules for this database:
-
A student can have multiple subjects.
-
A subject can be taught by multiple professors.
-
A student is taught a specific subject by only one professor.
-
A professor teaches only one subject.
-
Candidate Keys:
{StudentID, Subject}and{StudentID, Professor} -
Dependency:
Professor → Subject(Since a professor teaches only one subject). -
3NF Check: It is in 3NF because in
Professor → Subject,Subjectis a prime attribute. -
BCNF Check: It is NOT in BCNF because
Professoris a determinant but not a superkey.
BCNF Compliant Tables:
Table 1: Student_Prof (Primary Key: {StudentID, Professor}) |
StudentID | Professor |
|---|---|---|
| 101 | Dr. A | |
| 102 | Dr. B |
Table 2: Prof_Subject (Primary Key: Professor) |
Professor | Subject |
|---|---|---|
| Dr. A | Math | |
| Dr. B | Math |
Multivalued Dependencies (MVD)
A Multivalued Dependency exists when there are at least three attributes (e.g., A, B, C) in a relation, and for a single value of A, there are multiple values of B, and multiple values of C, but B and C are completely independent of each other.
- Notation: (Read as "X multidetermines Y").
- Condition: A relation has an MVD if, for a given value of , the set of values for is completely independent of the set of values for .
MVDs cause a massive multiplication of data rows (Cartesian product) to maintain data consistency.
Example Scenario
A Restaurant offers different Cuisines and provides different DeliveryAreas.
Cuisines and Delivery Areas are completely independent of each other. If a restaurant offers 3 cuisines and delivers to 3 areas, the database must store rows just to represent this accurately, causing extreme redundancy.
Fourth Normal Form (4NF)
A relation is in Fourth Normal Form if:
- It is in BCNF.
- It has no non-trivial multivalued dependencies.
Formally, for every non-trivial multivalued dependency , must be a superkey. If a table contains two or more independent multivalued attributes, it violates 4NF and must be decomposed.
Example
Table with MVD (Primary Key: {Restaurant, Cuisine, DeliveryArea}):
| Restaurant | Cuisine | DeliveryArea |
|---|---|---|
| PizzaHut | Italian | North |
| PizzaHut | Italian | South |
| PizzaHut | Vegan | North |
| PizzaHut | Vegan | South |
- Dependencies:
Restaurant ↠ CuisineandRestaurant ↠ DeliveryArea. - Violation:
CuisineandDeliveryAreaare independent of each other but depend onRestaurant. This is a non-trivial MVD, andRestaurantis not a superkey on its own.
4NF Compliant Tables:
Decompose the table to isolate the independent multivalued attributes.
| Table 1: Restaurant_Cuisine | Restaurant | Cuisine |
|---|---|---|
| PizzaHut | Italian | |
| PizzaHut | Vegan |
| Table 2: Restaurant_Delivery | Restaurant | DeliveryArea |
|---|---|---|
| PizzaHut | North | |
| PizzaHut | South |