Unit 6 - Notes

CSE316

Unit 6: File Management, Device Management and Inter Process Communication


Part 1: File Management

1. File Concepts

A file is a named collection of related information that is recorded on secondary storage. From a user’s perspective, a file is the smallest allotment of logical secondary storage.

  • File Attributes:

    • Name: The symbolic name used by humans.
    • Identifier: A unique tag (number) usually non-human-readable, used by the file system.
    • Type: Needed for systems that support different types (e.g., .exe, .txt).
    • Location: Pointer to a device and to the location of the file on that device.
    • Size: Current size (in bytes, words, or blocks).
    • Protection: Access-control information (who can read, write, execute).
    • Time, date, and user identification: Data for protection, security, and usage monitoring.
  • File Operations:
    Create, Write, Read, Reposition (Seek), Delete, Truncate.

2. Access Methods

Access methods determine how the information within a file is retrieved.

  • Sequential Access:
    • Information is processed in order, one record after the other.
    • Most common method (e.g., compilers, text editors).
    • Operations: read_next(), write_next(), rewind().
  • Direct (Random) Access:
    • A file is made up of fixed-length logical records.
    • Programs can read/write records rapidly in no particular order.
    • Essential for databases.
    • Operations: read(n), write(n) (where n is the block number).
  • Index Access:
    • Built on top of direct access.
    • An index contains pointers to the various blocks. To find a record, the index is searched first, followed by direct access to the file.

3. Directory Structure

The directory records information such as name, location, size, and type for all files.

  • Single-Level Directory: All files are contained in the same directory.
    • Pro: Simple to implement.
    • Con: Naming collisions and grouping problems as the number of files increases.
  • Two-Level Directory: Creates a Master File Directory (MFD) and a User File Directory (UFD) for each user.
    • Pro: Solves name collision between different users.
    • Con: Isolates users; inhibits sharing.
  • Tree-Structured Directory: A directory can contain files and subdirectories.
    • Pro: Efficient grouping; users can create their own sub-hierarchies.
    • Con: Searching can be complex.
  • Acyclic-Graph Directory: Allows directories to share subdirectories and files (using links/aliases). No cycles allowed.
  • General Graph Directory: Allows cycles. Requires garbage collection algorithms to determine when to delete files and prevent infinite loops during search.

4. File System Mounting and Sharing

  • Mounting: Before a file system can be accessed, it must be mounted. The OS is given the name of the device and the mount point (the location within the file structure where the file system is to be attached).
  • File Sharing:
    • Local: Done via protection schemes (User, Group, World).
    • Remote: Uses networking (NFS - Network File System, CIFS - Common Internet File System).
    • Consistency Semantics: Defines when modifications by one user are observable by others.

5. Protection

Protection mechanisms control access to files.

  • Types of Access: Read, Write, Execute, Append, Delete, List.
  • Access Control:
    • Access Control Lists (ACL): Associate with each file a list of names and the types of access allowed.
    • User/Group/Other (Unix approach): Condensed version of ACL using three bits (rwx) for three categories: Owner, Group, and Universe.

6. Allocation Methods

Methods used to allocate space on the disk for files.

  • Contiguous Allocation:
    • Each file occupies a set of contiguous blocks on the disk.
    • Pros: Excellent performance for sequential and direct access; simple (only need starting location and length).
    • Cons: External fragmentation; difficult to grow files.
  • Linked Allocation:
    • Each file is a linked list of disk blocks; blocks may be scattered anywhere.
    • Pros: No external fragmentation; easy to grow files.
    • Cons: Slow direct access (must traverse list); pointer overhead; reliability issues (if a pointer is lost).
  • Indexed Allocation:
    • Brings all pointers together into one block called the Index Block.
    • Pros: Supports direct access; no external fragmentation.
    • Cons: Wasted space for index blocks (overhead) for small files.

7. Free-Space Management

The system must track free disk blocks.

  • Bit Vector (Bit Map): Each block is represented by 1 bit. (1 = free, 0 = allocated). Simple and efficient but requires memory.
  • Linked List: Links all free blocks together. No waste of space, but traversing is slow.
  • Grouping: Stores addresses of free blocks in the first free block.
  • Counting: Stores the address of the first free block and the count of following contiguous free blocks.

8. Directory Implementation

  • Linear List: A list of filenames with pointers to data blocks. Simple to program but time-consuming to execute (linear search).
  • Hash Table: A linear list with a hash data structure. The hash function computes a value from the filename which points to the entry. Greatly decreases search time but requires handling hash collisions.

Part 2: Device Management (I/O Systems)

1. Device Classifications

  • Dedicated Devices: Assigned to only one process at a time until the process releases it (e.g., Tape drives, Plotters).
  • Shared Devices: Can be assigned to multiple processes simultaneously. The OS manages interleaving (e.g., Hard disks).
  • Virtual Devices: Dedicated devices converted into shared devices through software techniques like Spooling (Simultaneous Peripheral Operation On-Line). For example, a printer is dedicated, but the OS spools print jobs to a disk buffer, allowing multiple processes to "print" simultaneously.

2. Access Types

  • Serial Access Devices: Transmit data as a stream of bytes, one bit at a time (e.g., Mice, Keyboards, Serial ports). No addressing is usually required.
  • Direct Access Devices: Transfer blocks of data. The device can access any block directly (e.g., Hard Drives, SSDs, Optical Disks).

3. Direct Access Storage Devices (DASD)

Devices where any specific data block can be accessed directly without traversing previous blocks.

  • Magnetic Disks:
    • Platters: Rigid disks coated with magnetic material.
    • Tracks: Concentric circles on platters.
    • Sectors: Tracks divided into smallest storage units.
    • Cylinders: Set of tracks at the same arm position across all platters.
  • Seek Time: Time to move the disk arm to the desired cylinder.
  • Rotational Latency: Time for the desired sector to rotate under the disk head.

4. Disk Scheduling Methods

The OS must minimize seek time to improve performance.

  • FCFS (First-Come, First-Served): Processes requests in the order received. Fair, but poor performance.
  • SSTF (Shortest Seek Time First): Selects the request with the least seek time from the current head position. Can cause starvation of distant requests.
  • SCAN (Elevator Algorithm): The disk arm starts at one end and moves toward the other, servicing requests, then reverses direction.
  • C-SCAN (Circular SCAN): Like SCAN, but when it reaches the end, it immediately returns to the beginning without servicing requests on the return trip. Provides more uniform wait time.
  • LOOK / C-LOOK: Variations of SCAN/C-SCAN where the arm only goes as far as the last request in each direction, rather than the full width of the disk.

5. Channels and Control Units

To improve efficiency, the CPU offloads I/O tasks.

  • I/O Channel: A specialized processor with local memory and logic to control I/O devices. It executes "Channel Programs."
  • Control Unit (CU): Hardware that interfaces the I/O device with the I/O channel. It controls the specific mechanics of the device (e.g., moving the disk arm).
  • Hierarchy: CPU Channel Control Unit Device.

Part 3: Inter Process Communication (IPC)

1. Introduction to IPC Methods

IPC allows processes to exchange data and synchronize actions. It is essential for cooperating processes.

  • Need for IPC: Information sharing, computation speedup (parallelism), modularity.
  • Race Condition: A situation where several processes access and manipulate the same data concurrently, and the outcome depends on the order of execution.
  • Critical Section: Code segment where shared variables are accessed. IPC mechanisms often provide synchronization to protect critical sections.

2. Pipes

A unidirectional data channel that can be used for inter-process communication.

  • Concept: Data written to the write-end of the pipe is buffered by the kernel until it is read from the read-end.
  • popen and pclose functions (Standard Library High-Level Pipes):

    • Used to run a command as a subprocess and open a stream to/from it.

    popen syntax:

    C
        FILE *popen(const char *command, const char *type);
        

    • command: The shell command to execute.
    • type: "r" for reading (output of command), "w" for writing (input to command).
    • Returns: A standard file pointer.

    pclose syntax:

    C
        int pclose(FILE *stream);
        

    • Closes the stream and waits for the child process to terminate.

3. FIFOs (Named Pipes)

Unlike standard pipes (which are anonymous and only persist while processes are running), FIFOs exist as a device special file in the file system.

  • Characteristics:
    • Bi-directional (usually, though typically used uni-directionally).
    • Persistence: Exists on the disk until explicitly deleted.
    • Access: Can be accessed by unrelated processes (not just parent/child) provided they know the filename.
    • Created using mkfifo() system call.

4. Shared Memory

The fastest form of IPC.

  • Concept: A region of memory is set aside that is mapped into the address space of multiple processes.
  • Mechanism:
    1. One process creates the shared memory segment.
    2. Other processes "attach" this segment to their address space.
    3. Processes read/write directly to this memory location like standard variables.
  • Synchronization: The OS provides the memory, but the processes are responsible for synchronization (e.g., using semaphores) to prevent race conditions.

5. Message Queues

A linked list of messages stored within the kernel and identified by a message queue identifier.

  • Concept: Processes send discrete messages to the queue and receive messages from the queue.
  • Structure: Messages usually have a type (integer) and a payload (data).
  • Advantages:
    • Asynchronous: The sender can continue executing after sending; the receiver can read later.
    • Prioritization: Messages can be read based on type/priority rather than strictly FIFO.
  • Operations: msgsnd() (send) and msgrcv() (receive).