Unit 6 - Notes

CSE316

Unit 6: File Management

1. File Concepts

A file is a named collection of related information that is recorded on secondary storage. From a user's perspective, a file is the smallest allotment of logical secondary storage.

File Attributes

A file consists of two parts: the data and the metadata (attributes).

Name: The symbolic file name is the only information kept in human-readable form.
Identifier: A unique tag (usually a number) identifies the file within the file system; it is the non-human-readable name for the file.
Type: Needed for systems that support different types of files (e.g., .txt, .exe).
Location: Pointer to a device and to the location of the file on that device.
Size: The current size of the file (in bytes, words, or blocks).
Protection: Access-control information determines who can do reading, writing, executing, and so on.
Time, date, and user identification: Data for protection, security, and usage monitoring.

File Operations

The operating system provides system calls to create, write, read, reposition (seek), delete, and truncate files.

Access Methods

Files store information, and this information must be accessed and read into computer memory.

Sequential Access: Information in the file is processed in order, one record after the other. (e.g., Magnetic tapes, compilers).
- Read next
- Write next
- Rewind
Direct Access (Relative Access): A file is made up of fixed-length logical records that allow programs to read and write records rapidly in no particular order. (e.g., Databases).
- Read n
- Write n
- Position to n
Index Access Methods: Involves constructing an index for the file. The index contains pointers to the various blocks. To find a record, we first search the index and then use the pointer to access the file directly.

2. Directory Structure

The directory is a symbol table that translates file names into their directory entries.

Types of Directory Structures

Single-Level Directory: All files are contained in the same directory.
- Pros: Simple to implement.
- Cons: Naming collisions (two files cannot have the same name) and grouping difficulties.
Two-Level Directory: Creates a separate directory for each user (Master File Directory -> User File Directory).
- Pros: Solves naming collision between users.
- Cons: Isolates users; hinders sharing.
Tree-Structured Directory: Generalized directory structure allowing users to create their own subdirectories. A bit indicates if an entry is a file (0) or a subdirectory (1).
- Path names can be Absolute (from root) or Relative (from current working directory).
Acyclic-Graph Directory: Allows directories to share subdirectories and files (no cycles allowed). Implemented via links or symbolic links.
General Graph Directory: Allows cycles. Requires garbage collection to determine when to delete files (reference counting).

A detailed schematic diagram comparing Directory Structures. Panel A shows a "Two-Level Directory" w... — AI-generated image — may contain inaccuracies

File System Mounting and Sharing

Mounting: Before a file system can be accessed, it must be mounted. The OS is given the name of the device and the mount point (the location within the file structure where the file system is to be attached).
Sharing: Done through a protection scheme. On distributed systems, files are shared via Network File Systems (NFS).
- User IDs (UID): Identify users.
- Group IDs (GID): Allow subsets of users to share files.

Protection

Mechanisms to ensure files are accessed only by authorized processes.

Types of Access: Read, Write, Execute, Append, Delete, List.
Access Control List (ACL): Associate with each file a list of users and types of access allowed.
RWX Bits (UNIX): Three bits (Read, Write, Execute) for three categories: Owner, Group, Universe (Others). Example: rwxr-x--x (751).

3. File System Implementation

Allocation Methods

How disk blocks are allocated to files.

Contiguous Allocation:
- Each file occupies a set of contiguous blocks on the disk.
- Pros: Simple (start location, length), excellent for sequential and direct access.
- Cons: External fragmentation (free space usually broken into small chunks), requires knowing file size beforehand.
Linked Allocation:
- Each file is a linked list of disk blocks; blocks may be scattered anywhere on the disk. Each block contains a pointer to the next block.
- Pros: No external fragmentation.
- Cons: Slow for direct access (must traverse list), space overhead for pointers.
Indexed Allocation:
- Brings all pointers together into one location: the index block.
- Each file has its own index block, which is an array of disk-block addresses.
- Pros: Supports direct access, no external fragmentation.
- Cons: Wasted space for index blocks on small files.

A technical diagram illustrating three File Allocation Methods side-by-side. Left section: "Contiguo... — AI-generated image — may contain inaccuracies

Free-Space Management

The system tracks free disk blocks to allocate space for new files.

Bit Vector (Bitmap): A string of bits where 0 represents allocated and 1 represents free. Efficient to find the first free block but requires memory.
Linked List: Link all free disk blocks together. No space waste, but traversing is slow.
Grouping: Store the addresses of $n$ free blocks in the first free block.
Counting: Keep address of first free block and the count of following contiguous free blocks.

Directory Implementation

Linear List: A list of file names with pointers to the data blocks. Simple to program but time-consuming to search.
Hash Table: A linear list with a hash data structure. Takes a value computed from the file name and returns a pointer. Faster search but creates potential collisions.

4. Device Management

Device Classifications

Dedicated Devices: Assigned to only one job at a time for the entire duration of job execution (e.g., Tape drives, Plotters).
Shared Devices: Can be shared concurrently by several processes (e.g., Hard Disks, Optical drives).
Virtual Devices: A combination of dedicated and shared techniques. For example, a printer is a dedicated device, but Spooling allows it to appear as a shared device by queuing print jobs on the disk.

Access Types

Serial Access Devices: Data must be accessed in a specific linear sequence (e.g., Magnetic Tape). Access time depends on the current location.
Direct Access Devices: Data can be accessed directly at any location (e.g., Hard Disks, SSDs).

Disk Scheduling Methods

The OS must manage disk access time efficiently by minimizing Seek Time (time to move head to cylinder) and Rotational Latency.

FCFS (First-Come, First-Served): Fair, but generally provides poor service.
SSTF (Shortest Seek Time First): Selects the request with the minimum seek time from the current head position. Can cause starvation for distant requests.
SCAN (Elevator Algorithm): The disk arm starts at one end and moves toward the other, servicing requests, then reverses direction.
C-SCAN (Circular SCAN): Like SCAN, but when it reaches the end, it immediately returns to the beginning without servicing requests on the return trip. Provides more uniform wait time.
LOOK / C-LOOK: Similar to SCAN/C-SCAN, but the arm only goes as far as the last request in each direction, then reverses immediately (it doesn't go to the physical end of the disk).

A graphical plot comparing Disk Scheduling Algorithms. The diagram should show a horizontal axis lab... — AI-generated image — may contain inaccuracies

Direct Access Storage Devices (DASD) Structure

Channels: Specialized processors (I/O processors) that handle I/O operations independently of the main CPU. They execute channel programs.
Control Units (CU): The hardware interface between the channel and the specific I/O devices (like disk drives). It interprets commands from the channel and controls the physical device mechanisms.

5. Inter-Process Communication (IPC)

Processes executing concurrently may be independent or cooperating. Cooperating processes need IPC to exchange data and information.

Introduction to IPC Methods

IPC allows processes to communicate and synchronize their actions without sharing the same address space.

Message Passing: Useful for distributed environments (exchange messages).
Shared Memory: Useful for high-speed local communication.

1. Pipes

Pipes act as a conduit allowing two processes to communicate.

Characteristics: Usually unidirectional (half-duplex). Data written to one end is read from the other.
Ordinary Pipes (Anonymous): Exist only while the creating process exists. Usually used between parent and child processes.

popen and pclose Functions (Standard C Library):

FILE *popen(const char *command, const char *type);
- Creates a pipe, forks a child process, and invokes the shell to run command.
- If type is "r", the calling process reads the command's standard output.
- If type is "w", the calling process writes to the command's standard input.
int pclose(FILE *stream);
- Closes the stream associated with the pipe and waits for the child command to terminate.

2. Shared Memory

A region of memory that is shared by cooperating processes.
Speed: Fastest form of IPC because no kernel intervention is required once memory is mapped (data copying is avoided).
Synchronization: Processes are responsible for ensuring they are not writing to the same location simultaneously (requires semaphores or mutexes).

3. FIFOs (Named Pipes)

Unlike ordinary pipes, FIFOs exist as a special device file in the file system.
Persistence: They exist even after the process that created them has terminated.
Usage: Can be used by unrelated processes (do not need parent-child relationship).
Data is strictly First-In-First-Out.

4. Message Queues

A linked list of messages stored within the kernel and identified by a message queue identifier.
Flexibility: Messages can be retrieved in an order dictated by the message type, not necessarily FIFO.
Independence: The sending process can write a message to the queue and terminate; the receiving process can read it later.

A conceptual block diagram illustrating Inter-Process Communication (IPC) models. Split the image in... — AI-generated image — may contain inaccuracies