Unit 1 - Notes

INT306 8 min read

Unit 1: Introduction to database

1. Purpose of Database Systems

Before the advent of Database Management Systems (DBMS), organizations used File Processing Systems to store data. A DBMS was developed to address the specific limitations and disadvantages inherent in file processing systems.

Disadvantages of File Processing Systems (Why we need DBMS):

  • Data Redundancy and Inconsistency: In file systems, different programs often maintain their own files. This leads to the same data being stored in multiple places (redundancy), which can result in copies of the same data not matching (inconsistency).
  • Difficulty in Accessing Data: File systems require writing specific programs to extract data. If a request changes (e.g., "find all students" changes to "find all students with a GPA > 3.5"), a new program must be written.
  • Data Isolation: Data is scattered in various files, often in different formats, making it difficult to write new application programs to retrieve appropriate data.
  • Integrity Problems: Constraints (e.g., account balance > 0) are buried in program code. Adding new constraints is difficult.
  • Atomicity Problems: Systems are vulnerable to failure. If a fund transfer fails halfway through, the database must be restored to the state before the transaction started. File systems lack this mechanism.
  • Concurrent Access Anomalies: To improve performance, multiple users may access data simultaneously. Without supervision, this can lead to errors (e.g., two people reading a balance of 10 at the exact same time, resulting in a final balance of 80).
  • Security Problems: Not every user should have access to all data. In file systems, enforcing specific access rights is difficult.

2. Components of DBMS

A DBMS is a complex system involving several distinct components that interact to manage data effectively.

  1. Hardware: The physical electronic devices (computers, I/O devices, storage devices like hard drives) that offer the interface between computers and the real-world systems.
  2. Software: This implies the DBMS itself. It is the set of programs used to control and manage the overall database. It acts as an interface between the user and the database.
  3. Data: The most important component. It includes the actual operational data (students, products, etc.) and the metadata (schema, constraints).
  4. Procedures: These are the instructions and rules that govern the design and use of the database. This covers how to log in, how to start/stop the DBMS, and how to make backups.
  5. Users:
    • Database Administrators (DBA): Responsible for authorizing access, monitoring use, and acquiring software/hardware resources.
    • Database Designers: Identify the data to be stored and choose appropriate structures to represent and store this data.
    • End Users: People who interact with the system (Naive users, Application programmers, Sophisticated users).

3. Applications of DBMS

Databases are the core of almost all modern software applications. Key sectors include:

  • Banking: For customer information, accounts, loans, and banking transactions.
  • Airlines: For reservations and schedule information.
  • Universities: For student information, course registrations, and grades.
  • Telecommunications: To keep records of calls made, generating monthly bills, and maintaining balances.
  • Finance: For storing information about holdings, sales, and purchases of financial instruments.
  • Sales: For customer, product, and purchase information.
  • Human Resources: For information about employees, salaries, payroll taxes, and benefits.

4. Three-Tier DBMS Architecture

The architecture of a DBMS is divided into three levels of abstraction to separate the user applications from the physical database. This is known as the ANSI/SPARC Architecture.

A detailed block diagram illustrating the Three-Tier DBMS Architecture (ANSI/SPARC model). The diagr...
AI-generated image — may contain inaccuracies

The Three Levels:

  1. Physical Level (Internal Level):
    • The lowest level of abstraction.
    • Describes how the data is actually stored.
    • Deals with complex low-level data structures, file organizations, and access methods (indexes, hashing).
  2. Logical Level (Conceptual Level):
    • Describes what data is stored in the database and what relationships exist among those data.
    • It describes the entire database in terms of a small number of relatively simple structures (records).
    • Used by DBAs and developers.
  3. View Level (External Level):
    • The highest level of abstraction.
    • Describes only part of the entire database.
    • Exists to simplify user interaction; the system may provide many views for the same database.

5. Data Independence

Data independence is the ability to modify a schema definition in one level without affecting the schema definition in the next higher level.

1. Logical Data Independence

  • The capacity to change the conceptual schema without having to change the external schemas or application programs.
  • Example: Adding a new field (e.g., Date of Birth) to a student table should not break the application that only looks at the Student Name and ID.

2. Physical Data Independence

  • The capacity to change the internal schema without having to change the conceptual schema.
  • Example: Changing the storage device from a hard drive to an SSD, or changing the hashing algorithm used for storage, should not require changing the logical structure of the tables.

6. Database Schema and Instance

It is crucial to distinguish between the description of the database and the database itself.

  • Database Schema:
    • The logical design or blueprint of the database.
    • It is static; it changes very infrequently.
    • Analogy: The variable declaration in programming (e.g., int x;).
  • Database Instance:
    • The collection of information stored in the database at a particular moment.
    • It is dynamic; it changes every time data is inserted or deleted.
    • Analogy: The value of the variable at a specific time (e.g., x = 5; then later x = 10;).

7. Data Modeling

Data models are a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints.

Categories of Data Models:

  1. Relational Model: Uses tables to represent data and relationships. Most widely used.
  2. Entity-Relationship (ER) Model: Uses a collection of basic objects (entities) and relationships among these objects. Primarily used for database design.
  3. Object-Based Data Models: Extends the ER model with notions of encapsulation, methods, and object identity (Object-Oriented and Object-Relational).
  4. Semi-structured Data Model: Allows data specifications where individual data items of the same type may have different sets of attributes (e.g., XML, JSON).

8. Entity Relationship (ER) Model

The ER model is a high-level conceptual data model diagram. It helps in systematically analyzing data requirements to produce a well-designed database.

Core Concepts:

  • Entity: A "thing" or "object" in the real world that is distinguishable from other objects (e.g., Student, Car, Course).
  • Attribute: Descriptive properties possessed by each member of an entity set (e.g., Student Name, Car Color).
  • Relationship: An association among several entities (e.g., A Student enrolls in a Course).

A professional Entity-Relationship (ER) Diagram visualizing a University system. In the center, plac...
AI-generated image — may contain inaccuracies

Notation Guide:

  • Rectangle: Represents Entity Sets.
  • Ellipse: Represents Attributes.
  • Diamond: Represents Relationship Sets.
  • Line: Links attributes to entity sets and entity sets to relationship sets.
  • Double Ellipse: Multivalued attributes (e.g., Phone numbers).
  • Underlined Attribute: Primary Key (Unique identifier).

9. Relational Model

The relational model represents data as a collection of relations (tables). It is the theoretical basis for SQL (Structured Query Language).

Key Terminologies:

  • Relation: A table with columns and rows.
  • Tuple: A row in a table (represents a single record).
  • Attribute: A column header (represents a property of the data).
  • Domain: The set of permitted values for each attribute (e.g., Integer, Date, Varchar).
  • Degree: The number of attributes (columns) in a relation.
  • Cardinality: The number of tuples (rows) in a relation.

Keys in Relational Model:

  • Primary Key: A minimal set of attributes that uniquely identifies a tuple.
  • Foreign Key: An attribute in one table that refers to the Primary Key of another table, establishing a link between them.

10. Comparison: Relational vs. Non-Relational Databases

This is often framed as SQL vs. NoSQL.

A split-screen comparison diagram. On the left side, labeled "Relational (SQL)", draw a structured t...
AI-generated image — may contain inaccuracies
}. Between the two sides, add a lightning bolt icon or a 'VS' circle. Use blue tones for the SQL side to represent structure/rigidity, and orange tones for the NoSQL side to represent flexibility. Add text at the bottom: "Schema-based" under SQL and "Schema-less" under NoSQL.]

Comparison Table

Feature Relational Databases (SQL) Non-Relational Databases (NoSQL)
Structure Table-based (Rows/Columns). Structured data. Document, Key-Value, Graph, or Column-based. Unstructured or Semi-structured.
Schema Pre-defined schema. Must determine structure before adding data. Dynamic schema. Can add fields on the fly (Schema-less).
Scalability Vertical Scaling (Scale Up): Increasing RAM/CPU of a single server. Horizontal Scaling (Scale Out): Adding more servers to a cluster.
ACID vs BASE Follows ACID properties (Atomicity, Consistency, Isolation, Durability). Ensures strong consistency. Follows BASE model (Basically Available, Soft state, Eventual consistency). Prioritizes availability and speed over immediate consistency.
Query Language Uses SQL (Structured Query Language). Standardized. Uses UnQL (Unstructured Query Language) or API-based queries. Varies by database.
Examples MySQL, PostgreSQL, Oracle, Microsoft SQL Server. MongoDB, Cassandra, Redis, Amazon DynamoDB.
Best Used For Complex queries, multi-row transactions, legacy systems. Big data, real-time web apps, content management, rapid prototyping.