Unit 6 - Notes
INT108
Unit 6: Files and Exceptions; Regular Expressions
Part 1: File Handling
File handling is a crucial part of programming that allows the code to interact with permanent storage. In Python, file handling takes place with the built-in open() function.
1. Text Files
A text file stores data as a sequence of characters (strings). Python handles text files by decoding the bytes from the disk into a string format (usually using UTF-8 encoding).
- Extension: Typically
.txt,.py,.csv, etc. - Line Endings: Lines are terminated by the newline character
\n.
2. Opening a File
To perform any operation on a file, it must first be opened.
Syntax: file_object = open("filename", "mode")
Common Modes:
'r': Read (Default). Opens for reading. Errors if file does not exist.'w': Write. Opens for writing. Creates a new file or truncates (deletes content of) an existing file.'a': Append. Opens for writing. The pointer is placed at the end of the file. Creates a new file if it does not exist.'r+': Read and Write.
Best Practice (The with statement):
Using the with statement automatically closes the file, even if exceptions occur.
with open('example.txt', 'w') as file:
file.write("Hello World")
# File is automatically closed here
3. Writing to a File
To write to a text file, we use the write() or writelines() methods.
write(string): Writes a single string to the file.writelines(list_of_strings): Writes a list of strings to the file. Note: It does not automatically add newlines between strings.
lines = ["Line 1\n", "Line 2\n", "Line 3\n"]
with open('data.txt', 'w') as f:
f.write("Header Line\n")
f.writelines(lines)
4. Writing Variables
The write() method only accepts strings. To write integers, floats, or other objects, you must convert them to strings first using str() or f-strings.
score = 95
name = "Alice"
with open('results.txt', 'w') as f:
# Incorrect: f.write(score) -> TypeError
# Correct
f.write(name + " scored " + str(score))
# Or using f-string
f.write(f"\n{name}: {score}")
5. Reading from a File
Python provides three main methods to read data:
read(size): Reads the entire file as a single string. Ifsizeis specified, it reads that many bytes.readline(): Reads a single line (up to and including the\n).readlines(): Reads all lines and returns them as a list of strings.
with open('data.txt', 'r') as f:
content = f.read() # Reads whole file
print(content)
with open('data.txt', 'r') as f:
for line in f: # Memory efficient iteration
print(line.strip())
6. Directories
To manage files, we often need to interact with directories (folders). This is handled by the os module.
os.getcwd(): Get Current Working Directory.os.mkdir('folder_name'): Create a new directory.os.listdir(): List all files and folders in the current directory.os.path.join(): Intelligently join path components.
import os
current_dir = os.getcwd()
print(f"Current directory: {current_dir}")
# Create a directory if it doesn't exist
if not os.path.exists("new_folder"):
os.mkdir("new_folder")
7. Pickling
Pickling is the process of converting a Python object hierarchy (like a dictionary or list) into a byte stream (serialization). Unpickling is the inverse operation. This is used to save complex data structures to a file.
- Module:
pickle - Mode: Must use binary modes (
'wb'for write binary,'rb'for read binary).
import pickle
data = {'name': 'John', 'age': 30, 'scores': [80, 90, 100]}
# Pickling (Saving)
with open('data.pickle', 'wb') as f:
pickle.dump(data, f)
# Unpickling (Loading)
with open('data.pickle', 'rb') as f:
loaded_data = pickle.load(f)
print(loaded_data['scores']) # Output: [80, 90, 100]
Part 2: Exception Handling
Exceptions are events that disrupt the normal flow of the program's execution. If not handled, the program crashes.
1. The try-except Block
The critical operation is placed inside the try block. If an error occurs, the flow transfers to the except block.
Syntax:
try:
# Code that might raise an exception
except ExceptionType:
# Code to run if exception occurs
2. Handling ZeroDivisionError
This error occurs when code attempts to divide a number by zero.
try:
numerator = 10
denominator = 0
result = numerator / denominator
print(result)
except ZeroDivisionError:
print("Error: You cannot divide by zero.")
3. Handling FileNotFoundError
This error occurs when trying to open a file for reading that does not exist.
filename = "non_existent_file.txt"
try:
with open(filename, 'r') as f:
content = f.read()
except FileNotFoundError:
print(f"Sorry, the file {filename} does not exist.")
4. The else Block
The else block is optional. It runs only if no exceptions were raised in the try block. It is useful for code that should only execute if the try block succeeded.
try:
num = int(input("Enter a number: "))
except ValueError:
print("That is not a number!")
else:
# This runs only if the input was successfully converted to int
print(f"You entered {num}. Great job!")
Part 3: Regular Expressions (Regex)
1. Concept of Regular Expressions
A Regular Expression (RegEx) is a sequence of characters that forms a search pattern. It is used for string searching and manipulation (validation, finding substrings, replacing text).
- Module:
re
2. Various Types of Regular Expressions
Regex relies on Metacharacters (characters with special meaning) and Special Sequences.
Common Metacharacters:
.: Matches any character except newline.^: Starts with.$: Ends with.*: Zero or more occurrences.+: One or more occurrences.?: Zero or one occurrence.[]: A set of characters (e.g.,[a-z]).|: Either/Or.\: Escape character.
Special Sequences:
\d: Matches any digit (0-9).\D: Matches any non-digit.\w: Matches any alphanumeric character (a-z, 0-9, _).\s: Matches whitespace (space, tab, newline).
3. Using the match() Function
The re.match() function checks for a match only at the beginning of the string. If the pattern is found elsewhere, match() returns None.
- Returns: A match object if successful,
Noneotherwise.
import re
pattern = r"Python"
text1 = "Python is fun"
text2 = "I love Python"
# Case 1
match1 = re.match(pattern, text1)
if match1:
print("Match found at start!") # This prints
# Case 2
match2 = re.match(pattern, text2)
if match2:
print("Match found!")
else:
print("No match at the start.") # This prints
4. Web Scraping by using Regular Expressions
Web scraping involves extracting data from HTML content. While dedicated libraries like BeautifulSoup are common, Regular Expressions are powerful tools for finding specific patterns (like email addresses or hyperlinks) within raw HTML text.
Example Scenario: Extracting all email addresses from a snippet of HTML source code.
import re
html_content = """
<html>
<head><title>Contact Us</title></head>
<body>
<p>Please support us at support@example.com.</p>
<p>For sales inquiries, contact sales-team@business.org or admin@site.net.</p>
</body>
</html>
"""
# Regex breakdown:
# [\w\.-]+ : Matches word chars, dots, or dashes (username)
# @ : Matches the @ symbol
# [\w\.-]+ : Matches the domain name
# \. : Matches the dot before the extension
# [a-zA-Z]+ : Matches the domain extension (com, org, etc.)
email_pattern = r"[\w\.-]+@[\w\.-]+\.[a-zA-Z]+"
# re.findall() returns a list of all non-overlapping matches
emails = re.findall(email_pattern, html_content)
print("Emails found:")
for email in emails:
print(email)
Output:
Emails found:
support@example.com
sales-team@business.org
admin@site.net