Unit4 - Subjective Questions
INT306 • Practice Questions with Detailed Answers
Define Data Integrity in DBMS. Explain the two principal integrity rules: Entity Integrity and Referential Integrity.
Data Integrity refers to the accuracy, consistency, and reliability of data stored in the database. It ensures that the data remains unaltered during operations like storage, transmission, and retrieval.
Principal Integrity Rules
-
Entity Integrity Rule:
- Statement: No primary key value can be NULL.
- Reason: The primary key is used to uniquely identify individual tuples (rows) in a relation. If the primary key were NULL, the tuple could not be uniquely identified.
- Example: In a
Studenttable withRoll_Noas the Primary Key,Roll_Nocannot be empty.
-
Referential Integrity Rule:
- Statement: A foreign key value must match an existing primary key value in the parent table, or the foreign key value must be completely NULL.
- Reason: It ensures that relationships between tables remain consistent. It prevents 'orphan records' in the child table.
- Example: In an
Enrolledtable referencing aCoursetable, you cannot enroll a student in aCourse_IDthat does not exist in theCoursetable.
What is a Functional Dependency (FD)? Explain Trivial and Non-Trivial Functional Dependencies with examples.
A Functional Dependency is a constraint between two sets of attributes in a relation. It is denoted as , where and are subsets of attributes. It signifies that the value of uniquely determines the value of .
Types of Functional Dependencies
-
Trivial Functional Dependency:
- A dependency is trivial if is a subset of ().
- Example: In a relation with attributes and , the dependency is trivial because is part of the determinant set.
-
Non-Trivial Functional Dependency:
- A dependency is non-trivial if is not a subset of ($Y \
ot\subseteq X$). - Ideally, the intersection of and is empty.
- Example: In a
Studentrelation, is non-trivial becauseNameis not contained withinRoll_No.
- A dependency is non-trivial if is not a subset of ($Y \
State and explain Armstrong’s Axioms for functional dependencies.
Armstrong’s Axioms are a set of inference rules used to infer all the functional dependencies on a relational database. They are sound and complete.
-
Reflexivity Rule:
- If is a subset of (), then .
- Example: .
-
Augmentation Rule:
- If , then for any set of attributes .
- Example: If , then .
-
Transitivity Rule:
- If and , then .
- Example: If and , then .
Explain the concept of Closure of an Attribute Set with an algorithm. Why is it used?
The Closure of an Attribute Set , denoted as , is the set of all attributes that are functionally determined by under a set of functional dependencies .
Algorithm to compute
- Initialize .
- Repeat until no more attributes can be added to :
- For each functional dependency in :
- If , then add to ().
- For each functional dependency in :
- Return .
Utility/Usage
- Finding Candidate Keys: If the closure of a set of attributes includes all attributes of the relation, that set is a superkey (and potentially a candidate key).
- Checking Dependencies: To check if a specific dependency holds, we compute and check if .
Discuss the need for Normalization. Explain the update, delete, and insert anomalies with a suitable example.
Normalization is the process of organizing data to minimize redundancy and dependency. The primary need is to avoid database anomalies that occur in poorly structured relations.
Consider a table Student_Course(RollNo, Name, CourseID, CourseName).
-
Insertion Anomaly:
- We cannot insert data about a new course until a student enrolls in it, because
RollNo(Primary Key) cannot be NULL. This prevents adding independent course data.
- We cannot insert data about a new course until a student enrolls in it, because
-
Deletion Anomaly:
- If the only student enrolled in a specific course (e.g., 'Robotics') drops the course or is deleted, the information about the 'Robotics' course is lost entirely.
-
Update Anomaly:
- If the
CourseNamechanges (e.g., 'CS101' changes from 'C++' to 'Java'), we must update every row where 'CS101' appears. If one row is missed, the data becomes inconsistent.
- If the
Normalization solves these by splitting the table into Student, Course, and Enrollment tables.
Define First Normal Form (1NF). Explain how to convert a relation into 1NF.
A relation is in First Normal Form (1NF) if and only if:
- All attributes contain atomic (indivisible) values.
- There are no repeating groups or arrays.
- All entries in a column are of the same data type.
Example and Conversion
| Un-normalized Relation: | ID | Name | Phone_Numbers |
|---|---|---|---|
| 1 | A. Smith | 1234, 5678 |
Here, Phone_Numbers is multivalued, violating 1NF.
Conversion to 1NF:
We flatten the table so that each cell has a single value.
| ID | Name | Phone_Numbers |
|---|---|---|
| 1 | A. Smith | 1234 |
| 1 | A. Smith | 5678 |
Now, the relation is in 1NF.
Explain Second Normal Form (2NF). What is Partial Dependency?
Second Normal Form (2NF) builds upon 1NF. A relation is in 2NF if:
- It is in 1NF.
- It contains no Partial Dependencies.
Partial Dependency
A partial dependency occurs when a non-prime attribute (an attribute not part of any candidate key) depends on only a part of a composite candidate key, rather than the whole key.
Example:
- Relation:
Sales(ProductID, CustomerID, ProductName, Quantity) - Candidate Key:
- Dependency:
Here, ProductName depends only on ProductID (part of the key), not the combination of ProductID and CustomerID. This is a partial dependency. To convert to 2NF, we must decompose the table to separate ProductName into a product-specific table.
Define Third Normal Form (3NF). Explain the concept of Transitive Dependency.
Third Normal Form (3NF) ensures that the database is free of transitive dependencies. A relation is in 3NF if:
- It is in 2NF.
- There is no Transitive Dependency for non-prime attributes.
Alternatively, for every non-trivial FD :
- is a superkey, OR
- is a prime attribute (part of a candidate key).
Transitive Dependency
A transitive dependency occurs when a non-prime attribute depends on another non-prime attribute.
Mathematically: If and , then . If is the key, and and are non-prime, then represents a transitive dependency.
Example:
Student(ID, ZipCode, City)
Here, City depends on ZipCode, which depends on ID. City is transitively dependent on ID. This violates 3NF.
Compare 3NF and BCNF (Boyce-Codd Normal Form). Why is BCNF considered stronger than 3NF?
Both 3NF and BCNF aim to reduce redundancy, but BCNF is more restrictive.
Comparison Table
| Feature | 3NF (Third Normal Form) | BCNF (Boyce-Codd Normal Form) |
|---|---|---|
| Requirement | For , is a superkey OR is a prime attribute. | For , must be a superkey. |
| Strictness | Less strict (allows dependency if RHS is prime). | More strict (does not allow exceptions for prime attributes). |
| Redundancy | May still contain some redundancy due to overlapping candidate keys. | Removes all redundancy based on functional dependencies. |
| Dependency Preservation | Always preserves dependencies. | May not preserve all functional dependencies upon decomposition. |
Why Stronger?
BCNF handles cases where a prime attribute depends on a non-prime attribute (which 3NF allows). Every relation in BCNF is in 3NF, but not every relation in 3NF is in BCNF.
Explain Boyce-Codd Normal Form (BCNF) with a detailed example where a table is in 3NF but not in BCNF.
A relation is in BCNF if for every non-trivial functional dependency , is a superkey of .
Example (3NF but not BCNF)
Consider a table Student_Project tracking students, projects, and guides.
- Attributes:
(Student, Project, Guide) - Rules:
- A student works on one project.
- A project can have multiple guides.
- A specific guide works on only one project.
Dependencies:
- (Candidate Key is )
- (Because a guide belongs to only one project)
Analysis:
- Candidate Keys: and .
- Attributes
Student,Project, andGuideare all prime attributes (parts of keys). - The dependency holds.
- Is it in 3NF? Yes, because on the RHS,
Projectis a prime attribute. 3NF allows this. - Is it in BCNF? No, because
Guide(the LHS) is not a superkey (it cannot identifyStudent).
- Is it in 3NF? Yes, because on the RHS,
To convert to BCNF, we must decompose the table.
What is a Multivalued Dependency (MVD)? How is it represented?
Multivalued Dependency (MVD) occurs when the presence of one or more rows in a table implies the presence of one or more other rows in that same table. It usually happens when two independent multi-valued attributes are stored in the same table along with a key.
Representation
It is denoted as (read as "X multidetermines Y").
Condition
In a relation , holds if, for any pair of tuples and such that , there exist tuples and such that:
- and
- and
Intuitively, determines a set of values for , and this set is independent of .
Define Fourth Normal Form (4NF) and explain why it is needed over BCNF.
A relation is in Fourth Normal Form (4NF) if:
- It is in BCNF.
- It contains no non-trivial Multivalued Dependencies (MVDs).
Specifically, for every non-trivial MVD , must be a superkey.
Need over BCNF
BCNF deals with functional dependencies () but cannot handle redundancies caused by Multivalued Dependencies.
Example:
Consider Course(CourseID, Instructor, Textbook).
- A course has multiple instructors.
- A course has multiple textbooks.
- Instructors and Textbooks are independent.
If we store (CS101, Prof. A, Book X) and (CS101, Prof. B, Book Y), to maintain consistency, we must also store (CS101, Prof. A, Book Y) and (CS101, Prof. B, Book X). This Cartesian product redundancy is not caught by BCNF because there are no functional dependencies violating BCNF rules. 4NF eliminates this by decoupling independent multivalued facts.
Given the relation and Functional Dependencies: .
- Find all Candidate Keys.
- Determine the highest Normal Form of the relation.
1. Finding Candidate Keys:
We check the closure of attributes.
- Attributes not on the RHS of any FD: . So must be part of any key.
- Try : . Not a key.
- Try : . Key found: ACD.
- Try : . Key found: BCD.
- Try : . Key found: ECD.
Candidate Keys: .
Prime Attributes: (All attributes are prime).
2. Determining Normal Form:
- 1NF: Assumed yes.
- 2NF: Are there partial dependencies?
- : LHS is a proper subset of key . This is a Partial Dependency.
Conclusion: The relation is in 1NF but not 2NF.
What are Prime and Non-prime attributes? How do they determine the violations of 2NF and 3NF?
- Prime Attribute: An attribute that is a member of any candidate key of the relation.
- Non-prime Attribute: An attribute that is not a member of any candidate key.
Role in Normal Forms
-
2NF Violation (Partial Dependency):
- Occurs when a Non-prime attribute is functionally dependent on a part of a candidate key.
- Condition: .
-
3NF Violation (Transitive Dependency):
- Occurs when a Non-prime attribute is functionally dependent on another Non-prime attribute.
- Condition: .
Describe the Lossless Join Decomposition property. Why is it essential during normalization?
Lossless Join Decomposition ensures that when a relation is decomposed into sub-relations and , the natural join of and returns exactly the original relation , without creating any spurious (fake) tuples.
Condition for Lossless Join
For a decomposition of into and , the decomposition is lossless if:
OR
This means the common attribute(s) between the two tables must be a superkey in at least one of the tables.
Importance
If a decomposition is lossy (not lossless), joining the tables back together results in more tuples than the original table (spurious tuples). This corrupts the database information, making the data unreliable.
Explain the concept of Dependency Preservation in decomposition.
Dependency Preservation ensures that constraints (Functional Dependencies) defined on the original relation can be checked efficiently in the decomposed relations without performing a join operation.
Given a relation with FD set , decomposed into with FD sets :
- The decomposition is dependency preserving if the union of all functional dependencies in the sub-relations () is equivalent to the closure of the original dependencies .
Significance:
If a decomposition is not dependency preserving, every time an insert or update occurs, the DBMS might have to join tables to check if a specific functional dependency is violated, which is computationally expensive.
What is the difference between Functional Dependency and Multivalued Dependency?
| Feature | Functional Dependency (FD) | Multivalued Dependency (MVD) |
|---|---|---|
| Notation | ||
| Definition | A single value of determines a single value of . | A single value of determines a set of values for , independent of other attributes. |
| Relationship | Represents a 1:1 or N:1 relationship. | Represents a 1:N relationship where independence exists between multiple 1:N relationships. |
| Normal Form | ADDRESSED BY BCNF (and 2NF, 3NF). | ADDRESSED BY 4NF. |
| Example | Student_ID determines Student_Name. |
Course determines a list of Instructors AND a list of Textbooks independently. |
Derive the Canonical Cover (Minimal Cover) for a given set of functional dependencies. Explain the steps involved.
A Canonical Cover () is a simplified set of functional dependencies that has the same closure as the original set , but is minimal (no redundant attributes or dependencies).
Steps to find Canonical Cover:
-
Decomposition of RHS:
Ensure every FD has a single attribute on the right-hand side.
(e.g., becomes and ). -
Remove Extraneous Attributes:
Check the LHS of each FD. If an attribute can be removed without changing the closure, remove it.
(e.g., if and exists, is extraneous). -
Remove Redundant Dependencies:
Check if an entire FD can be inferred from the remaining FDs.- Temporarily remove an FD ().
- Calculate using the remaining FDs.
- If , the removed FD is redundant.
-
Union:
Combine FDs with the same LHS back together (optional, but standard for final notation).
Provide a comprehensive summary of the progression from 1NF to 4NF, highlighting the specific problem each form solves.
The normalization process is a progressive tightening of rules to ensure data integrity:
-
First Normal Form (1NF):
- Problem Solved: Non-atomic values, repeating groups, and composite columns.
- Result: All values are atomic.
-
Second Normal Form (2NF):
- Problem Solved: Partial Dependency (attributes depending on only part of a composite key).
- Result: All non-prime attributes depend on the full candidate key.
-
Third Normal Form (3NF):
- Problem Solved: Transitive Dependency (attributes depending on other non-key attributes).
- Result: All non-prime attributes depend only on the candidate key.
-
Boyce-Codd Normal Form (BCNF):
- Problem Solved: Anomalies where a prime attribute depends on a non-prime attribute.
- Result: Every determinant is a superkey.
-
Fourth Normal Form (4NF):
- Problem Solved: Multivalued Dependencies (independent one-to-many relationships causing Cartesian products).
- Result: Separates independent multivalued facts.
Consider the relation with . Explain the inference rules for Multivalued Dependencies.
Similar to Armstrong's Axioms for FDs, there are inference rules for Multivalued Dependencies (MVDs):
-
Complementation Rule:
If , then .
Since determines , it implicitly determines the rest of the attributes in the relation independently. -
Augmentation Rule:
If and , then . -
Transitivity Rule:
If and , then . -
Replication Rule (FD to MVD):
If (Functional Dependency), then (Multivalued Dependency). Every FD is an MVD.