Unit 1 - Notes

INT306 9 min read

Unit 1: Introduction to database

1. Purpose of Database Systems

Before the advent of Database Management Systems (DBMS), data was stored using traditional File-Processing Systems. The primary purpose of a database system is to overcome the severe limitations of these file-based systems. A DBMS provides a systematic and centralized way of storing, managing, and retrieving data.

Drawbacks of File Systems (which DBMS solves):

  • Data Redundancy and Inconsistency: In file systems, the same data might be duplicated across multiple files (redundancy), leading to inconsistencies if one file is updated but another is not. DBMS centralizes data, minimizing redundancy and ensuring consistency.
  • Difficulty in Accessing Data: File systems require writing specific application programs to extract specific data. DBMS provides high-level query languages (like SQL) for flexible and rapid data retrieval.
  • Data Isolation: Data in file systems is scattered across various files with different formats, making it difficult to write new application programs to retrieve the appropriate data.
  • Integrity Problems: Data values must satisfy certain consistency constraints (e.g., an account balance cannot be negative). Enforcing these in file systems requires hardcoding them into applications. DBMS allows constraints to be declared directly in the database schema.
  • Atomicity Problems: A computer system failure can leave a transaction partially executed. DBMS ensures Atomicity (all-or-nothing execution) to keep the database in a consistent state.
  • Concurrent Access Anomalies: Multiple users updating data simultaneously in a file system can result in inconsistent data. DBMS provides concurrency control mechanisms to handle simultaneous access safely.
  • Security Problems: File systems make it difficult to enforce strict user-access rights. DBMS offers robust security and authorization subsystems.

2. Components of DBMS

A DBMS is a complex software system comprised of several interacting components that work together to manage data efficiently.

  1. Hardware: The physical electronic devices, such as servers, hard drives, I/O devices, and network infrastructure, where the database resides and operates.
  2. Software:
    • DBMS Software: The core program that manages the database (e.g., MySQL, Oracle).
    • Operating System: Manages hardware resources and acts as an interface between the hardware and DBMS.
    • Application Programs/Utilities: Software used to interact with the database for specific tasks.
  3. Data: The most critical component. It includes the actual operational data and the metadata (data about data, describing the structure).
  4. Procedures: The rules, instructions, and guidelines that govern the design and use of the database (e.g., backup procedures, setup protocols, login procedures).
  5. Users:
    • Database Administrator (DBA): Responsible for authorizing access, monitoring usage, and managing software/hardware resources.
    • Database Designers: Responsible for identifying the data to be stored and choosing the appropriate structures.
    • Application Programmers: Write application programs in languages like Java or Python to interact with the database.
    • End-Users: Naive or sophisticated users who interact with the system from a terminal or application interface.

3. Applications of DBMS

DBMS is ubiquitous in modern technology. Key applications include:

  • Banking and Finance: For customer information, accounts, loans, and banking transactions.
  • Airlines and Railways: For reservations, schedule information, and ticketing (requires highly concurrent access).
  • Universities: For student information, course registrations, and grading.
  • Telecommunications: For keeping records of calls made, generating monthly bills, maintaining balances on prepaid calling cards, and routing communication networks.
  • E-commerce and Sales: For customer information, product tracking, inventory management, and order processing.
  • Human Resources: For employee records, salaries, payroll, and tax deductions.
  • Healthcare: For maintaining patient records, medical histories, and billing.

4. Three-Tier DBMS Architecture

The three-tier (or ANSI/SPARC) architecture separates the user applications from the physical database, providing data abstraction and independence.

  1. Internal Level (Physical Level):
    • The lowest level of data abstraction.
    • Describes how the data is actually stored on the physical storage media (e.g., disks).
    • Deals with complex low-level data structures, file organization, access paths (indexes), and space allocation.
  2. Conceptual Level (Logical Level):
    • The next higher level of abstraction.
    • Describes what data is stored in the entire database and the relationships among that data.
    • Used by Database Administrators (DBAs). It hides the complex physical storage details.
    • Defines entities, attributes, relationships, and constraints.
  3. External Level (View Level):
    • The highest level of data abstraction.
    • Describes only a part of the entire database that is relevant to a specific user or application.
    • Different users can have different views of the same database. This enhances security by hiding sensitive data from unauthorized users.

5. Data Independence

Data independence is the ability to modify the schema at one level of the database system without altering the schema at the next higher level. It is a direct benefit of the three-tier architecture.

  • Logical Data Independence:
    • The capacity to change the conceptual schema without having to change the external schemas or application programs.
    • Example: Adding a new column to a table or splitting a table into two should not break the existing application code that queries the original fields.
    • It is relatively difficult to achieve because application programs are heavily dependent on the logical structure of the data.
  • Physical Data Independence:
    • The capacity to change the internal (physical) schema without having to change the conceptual schema.
    • Example: Changing the file organization from sequential to indexed, or moving data from one hard drive to another, does not require changes to the logical structure of tables.
    • It is easier to achieve than logical data independence.

6. Database Schema and Instance

Database Schema

  • Definition: The overall logical design and structure of the database. It is the skeleton or blueprint.
  • Characteristics: It is defined during the database design phase and is expected to change very rarely.
  • Example: Defining a Student table with columns StudentID (Integer), Name (String), and Age (Integer).

Database Instance

  • Definition: The actual content (data) stored in a database at a particular moment in time. Also known as the database state.
  • Characteristics: It is highly dynamic and changes constantly as data is inserted, updated, or deleted.
  • Example: A specific row in the Student table: (101, 'Alice', 20). If Alice's age is updated to 21 tomorrow, the schema remains the same, but the instance has changed.

7. Data Modeling

Data modeling is the process of creating a conceptual representation of data objects, the associations between different data objects, and the rules. It is a crucial step in database design.

Phases of Data Modeling:

  1. Conceptual Data Model: High-level representation of business needs. Focuses on entities and relationships (e.g., ER Diagram). Independent of any specific DBMS.
  2. Logical Data Model: Translates the conceptual model into a structure like the Relational Model. Defines tables, columns, keys, and relationships, but is independent of physical storage.
  3. Physical Data Model: Details how the data will be stored in a specific DBMS (e.g., Oracle, MySQL). Includes data types, indexes, partitions, and storage allocation.

8. Entity-Relationship (ER) Model

The ER model is a high-level conceptual data model used to define the data elements and their relationships for a specified system. It represents real-world entities and the relationships between them.

Core Concepts:

  1. Entity: A real-world object or concept that exists independently (e.g., Employee, Student, Car). Represented by a Rectangle in ER diagrams.
    • Weak Entity: An entity that cannot be uniquely identified by its own attributes and relies on a relationship with a strong entity (represented by a Double Rectangle).
  2. Attribute: Properties or characteristics that describe an entity (e.g., Employee ID, Name). Represented by an Oval.
    • Key Attribute: Uniquely identifies an entity (Underlined oval).
    • Composite Attribute: Can be divided into smaller sub-parts (e.g., Name divided into First Name and Last Name).
    • Multivalued Attribute: Can have multiple values (e.g., Phone Number). Represented by a Double Oval.
    • Derived Attribute: Value is derived from another attribute (e.g., Age derived from Date of Birth). Represented by a Dashed Oval.
  3. Relationship: An association among several entities (e.g., a Student "enrolls in" a Course). Represented by a Diamond.
    • Cardinality Ratios:
      • One-to-One (1:1)
      • One-to-Many (1:N)
      • Many-to-One (N:1)
      • Many-to-Many (M:N)

9. Relational Model

Proposed by E.F. Codd in 1970, the relational model represents data as a collection of relations (tables). It is the foundation of most modern DBMS (RDBMS).

Terminology:

  • Relation: A table with columns and rows.
  • Tuple: A single row in a table (represents a single record/instance).
  • Attribute: A column in a table (represents a property of the relation).
  • Domain: The set of allowable values for one or more attributes.
  • Degree: The number of attributes (columns) in a relation.
  • Cardinality: The number of tuples (rows) in a relation.

Keys in Relational Model:

  • Super Key: A set of one or more attributes that can uniquely identify a tuple.
  • Candidate Key: A minimal super key; a super key with no redundant attributes.
  • Primary Key: The candidate key chosen by the database designer to uniquely identify tuples in a table. Must be unique and cannot be NULL.
  • Foreign Key: An attribute (or set of attributes) in one relation that refers to the primary key of another relation, used to establish a link between the two tables.

10. Comparison of Relational and Non-Relational Databases

Relational Databases (SQL) and Non-Relational Databases (NoSQL) serve different use cases based on data structure, scalability, and consistency requirements.

Feature Relational Databases (SQL) Non-Relational Databases (NoSQL)
Data Model Table-based (Rows and Columns). Document, Key-Value, Column-family, or Graph.
Schema Rigid, predefined schema. Data must fit the structure before insertion. Dynamic/Flexible schema. Unstructured or semi-structured data can be stored easily.
Query Language Structured Query Language (SQL) is standard. Varies by database (e.g., MongoDB uses JSON-like query objects). No standard language.
Scalability Generally Vertical Scaling (scaling up by adding more CPU/RAM to a single server). Highly suited for Horizontal Scaling (scaling out by adding more servers to a distributed network).
Transactions & Consistency Strictly adheres to ACID properties (Atomicity, Consistency, Isolation, Durability). Ensures absolute data integrity. Follows the BASE model (Basically Available, Soft state, Eventual consistency). Prioritizes availability and performance over immediate consistency.
Best Used For Complex queries, highly structured data, financial systems, ERPs, CRMs. Big Data, real-time web apps, rapid development, IoT data, content management systems.
Examples MySQL, PostgreSQL, Oracle, Microsoft SQL Server. MongoDB (Document), Redis (Key-Value), Cassandra (Column), Neo4j (Graph).