Unit 6 - Notes

INT108

Unit 6: Files and Exceptions; Regular Expressions

Part 1: File Handling

File handling is a crucial part of programming that allows the code to interact with permanent storage. In Python, file handling takes place with the built-in open() function.

1. Text Files

A text file stores data as a sequence of characters (strings). Python handles text files by decoding the bytes from the disk into a string format (usually using UTF-8 encoding).

  • Extension: Typically .txt, .py, .csv, etc.
  • Line Endings: Lines are terminated by the newline character \n.

2. Opening a File

To perform any operation on a file, it must first be opened.
Syntax: file_object = open("filename", "mode")

Common Modes:

  • 'r': Read (Default). Opens for reading. Errors if file does not exist.
  • 'w': Write. Opens for writing. Creates a new file or truncates (deletes content of) an existing file.
  • 'a': Append. Opens for writing. The pointer is placed at the end of the file. Creates a new file if it does not exist.
  • 'r+': Read and Write.

Best Practice (The with statement):
Using the with statement automatically closes the file, even if exceptions occur.

PYTHON
with open('example.txt', 'w') as file:
    file.write("Hello World")
# File is automatically closed here

3. Writing to a File

To write to a text file, we use the write() or writelines() methods.

  • write(string): Writes a single string to the file.
  • writelines(list_of_strings): Writes a list of strings to the file. Note: It does not automatically add newlines between strings.

PYTHON
lines = ["Line 1\n", "Line 2\n", "Line 3\n"]

with open('data.txt', 'w') as f:
    f.write("Header Line\n")
    f.writelines(lines)

4. Writing Variables

The write() method only accepts strings. To write integers, floats, or other objects, you must convert them to strings first using str() or f-strings.

PYTHON
score = 95
name = "Alice"

with open('results.txt', 'w') as f:
    # Incorrect: f.write(score) -> TypeError
    
    # Correct
    f.write(name + " scored " + str(score))
    # Or using f-string
    f.write(f"\n{name}: {score}")

5. Reading from a File

Python provides three main methods to read data:

  • read(size): Reads the entire file as a single string. If size is specified, it reads that many bytes.
  • readline(): Reads a single line (up to and including the \n).
  • readlines(): Reads all lines and returns them as a list of strings.

PYTHON
with open('data.txt', 'r') as f:
    content = f.read()  # Reads whole file
    print(content)

with open('data.txt', 'r') as f:
    for line in f:      # Memory efficient iteration
        print(line.strip())

6. Directories

To manage files, we often need to interact with directories (folders). This is handled by the os module.

  • os.getcwd(): Get Current Working Directory.
  • os.mkdir('folder_name'): Create a new directory.
  • os.listdir(): List all files and folders in the current directory.
  • os.path.join(): Intelligently join path components.

PYTHON
import os

current_dir = os.getcwd()
print(f"Current directory: {current_dir}")

# Create a directory if it doesn't exist
if not os.path.exists("new_folder"):
    os.mkdir("new_folder")

7. Pickling

Pickling is the process of converting a Python object hierarchy (like a dictionary or list) into a byte stream (serialization). Unpickling is the inverse operation. This is used to save complex data structures to a file.

  • Module: pickle
  • Mode: Must use binary modes ('wb' for write binary, 'rb' for read binary).

PYTHON
import pickle

data = {'name': 'John', 'age': 30, 'scores': [80, 90, 100]}

# Pickling (Saving)
with open('data.pickle', 'wb') as f:
    pickle.dump(data, f)

# Unpickling (Loading)
with open('data.pickle', 'rb') as f:
    loaded_data = pickle.load(f)

print(loaded_data['scores']) # Output: [80, 90, 100]


Part 2: Exception Handling

Exceptions are events that disrupt the normal flow of the program's execution. If not handled, the program crashes.

1. The try-except Block

The critical operation is placed inside the try block. If an error occurs, the flow transfers to the except block.

Syntax:

PYTHON
try:
    # Code that might raise an exception
except ExceptionType:
    # Code to run if exception occurs

2. Handling ZeroDivisionError

This error occurs when code attempts to divide a number by zero.

PYTHON
try:
    numerator = 10
    denominator = 0
    result = numerator / denominator
    print(result)
except ZeroDivisionError:
    print("Error: You cannot divide by zero.")

3. Handling FileNotFoundError

This error occurs when trying to open a file for reading that does not exist.

PYTHON
filename = "non_existent_file.txt"

try:
    with open(filename, 'r') as f:
        content = f.read()
except FileNotFoundError:
    print(f"Sorry, the file {filename} does not exist.")

4. The else Block

The else block is optional. It runs only if no exceptions were raised in the try block. It is useful for code that should only execute if the try block succeeded.

PYTHON
try:
    num = int(input("Enter a number: "))
except ValueError:
    print("That is not a number!")
else:
    # This runs only if the input was successfully converted to int
    print(f"You entered {num}. Great job!")


Part 3: Regular Expressions (Regex)

1. Concept of Regular Expressions

A Regular Expression (RegEx) is a sequence of characters that forms a search pattern. It is used for string searching and manipulation (validation, finding substrings, replacing text).

  • Module: re

2. Various Types of Regular Expressions

Regex relies on Metacharacters (characters with special meaning) and Special Sequences.

Common Metacharacters:

  • .: Matches any character except newline.
  • ^: Starts with.
  • $: Ends with.
  • *: Zero or more occurrences.
  • +: One or more occurrences.
  • ?: Zero or one occurrence.
  • []: A set of characters (e.g., [a-z]).
  • |: Either/Or.
  • \: Escape character.

Special Sequences:

  • \d: Matches any digit (0-9).
  • \D: Matches any non-digit.
  • \w: Matches any alphanumeric character (a-z, 0-9, _).
  • \s: Matches whitespace (space, tab, newline).

3. Using the match() Function

The re.match() function checks for a match only at the beginning of the string. If the pattern is found elsewhere, match() returns None.

  • Returns: A match object if successful, None otherwise.

PYTHON
import re

pattern = r"Python"
text1 = "Python is fun"
text2 = "I love Python"

# Case 1
match1 = re.match(pattern, text1)
if match1:
    print("Match found at start!") # This prints

# Case 2
match2 = re.match(pattern, text2)
if match2:
    print("Match found!")
else:
    print("No match at the start.") # This prints

4. Web Scraping by using Regular Expressions

Web scraping involves extracting data from HTML content. While dedicated libraries like BeautifulSoup are common, Regular Expressions are powerful tools for finding specific patterns (like email addresses or hyperlinks) within raw HTML text.

Example Scenario: Extracting all email addresses from a snippet of HTML source code.

PYTHON
import re

html_content = """
<html>
<head><title>Contact Us</title></head>
<body>
    <p>Please support us at support@example.com.</p>
    <p>For sales inquiries, contact sales-team@business.org or admin@site.net.</p>
</body>
</html>
"""

# Regex breakdown:
# [\w\.-]+   : Matches word chars, dots, or dashes (username)
# @          : Matches the @ symbol
# [\w\.-]+   : Matches the domain name
# \.         : Matches the dot before the extension
# [a-zA-Z]+  : Matches the domain extension (com, org, etc.)
email_pattern = r"[\w\.-]+@[\w\.-]+\.[a-zA-Z]+"

# re.findall() returns a list of all non-overlapping matches
emails = re.findall(email_pattern, html_content)

print("Emails found:")
for email in emails:
    print(email)

Output:

TEXT
Emails found:
support@example.com
sales-team@business.org
admin@site.net