Unit 3 - Notes

INT331

Unit 3: Basic Git

1. Introduction to Git

What is Git?

Git is a free, open-source Distributed Version Control System (DVCS) designed to handle everything from small to very large projects with speed and efficiency. It was created by Linus Torvalds in 2005 to develop the Linux kernel.

Key Characteristics

  • Distributed: Unlike Centralized Version Control Systems (CVCS) like SVN or CVS, every developer has a full copy of the project history on their local machine. If the central server goes down, any client repository can be used to restore it.
  • Snapshot-based: Git thinks of its data like a set of snapshots of a miniature filesystem. Every time you commit, Git takes a picture of what all your files look like at that moment and stores a reference to that snapshot.
  • Data Integrity: Everything in Git is check-summed before it is stored and is then referred to by that checksum (SHA-1 hash). It is impossible to change the contents of any file or directory without Git knowing about it.
  • Performance: Most operations in Git only need local files and resources to operate, making it incredibly fast compared to systems that require network latency for every operation.

Git vs. Centralized VCS (SVN)

Feature Git (Distributed) SVN (Centralized)
Repository Every user has a full copy of the history. One central server has the history; users check out current files.
Offline Work Full capability (commit, branch, log). Limited (cannot view history or commit).
Speed Fast (local operations). Slower (network dependency).
Branching Cheap, fast, and easy. Expensive and resource-heavy.

2. Version Controlling using Git

Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later.

Why use Version Control?

  1. Collaboration: Allows multiple developers to work on the same codebase simultaneously without overwriting each other's work.
  2. History & Audit: You can see who made changes, what changes were made, and when.
  3. Backup: Every clone is a full backup of the repository.
  4. Branching & Experimentation: Developers can work on new features safely in isolation without breaking the production code.

The Git Data Model

Git tracks content rather than files.

  • Blob: Represents the content of a file (binary large object).
  • Tree: Represents a directory (maps names to blobs or other trees).
  • Commit: A snapshot of the working tree at a specific time, containing metadata (author, message) and a pointer to the parent commit.

3. The Git Lifecycle

The Git lifecycle defines the states a file resides in as it moves from the working directory to the permanent repository history.

The Three Sections

  1. Working Directory: The actual files you see and edit on your computer's file system. This is your "sandbox."
  2. Staging Area (Index): A file (specifically the .git/index file) that stores information about what will go into your next commit. It acts as a loading dock.
  3. Local Repository (.git directory): Where Git stores the metadata and object database for your project. This is the "permanent" history.

The Three States

  1. Modified: You have changed the file in the Working Directory but have not committed it to your database yet.
  2. Staged: You have marked a modified file in its current version to go into your next commit snapshot.
  3. Committed: The data is safely stored in your local database.

Lifecycle Flow

  1. Untracked: A new file is created. Git sees it but does not track it.
  2. Add (git add): The file moves to the Staging Area.
  3. Commit (git commit): The file moves from Staging to the Local Repository.
  4. Push: The commit is sent to a Remote Repository.

4. Common Git Commands

Configuration

Before using Git, you must configure your identity.

BASH
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"

Initialization

Start a new repository.

BASH
# Initialize a new Git repo in the current directory
git init

Staging Changes

Preparing files for a commit.

BASH
# Stage a specific file
git add filename.txt

# Stage all changes in the current directory
git add .

Committing

Saving the snapshot.

BASH
# Commit staged changes with a message
git commit -m "Initial commit"

# Skip staging and commit all modified tracked files
git commit -a -m "Fix bug in login"

Inspection

Checking the state and history.

BASH
# Check the status of files (staged, modified, untracked)
git status

# View commit history
git log

# View a compact history graph
git log --oneline --graph --all

Branching

Managing parallel lines of development.

BASH
# List branches
git branch

# Create a new branch
git branch feature-login

# Switch to a branch
git checkout feature-login
# OR (newer command)
git switch feature-login

# Create and switch in one command
git checkout -b feature-login

Merging

Combining history.

BASH
# Merge 'feature-login' into the current branch (usually main)
git merge feature-login


5. Working with Remote Repository

A remote repository is a version of your project that is hosted on the internet or network (e.g., GitHub, GitLab, Bitbucket).

Connecting to a Remote

To collaborate, you must link your local repo to a remote server.

BASH
# specific command syntax
git remote add <shortname> <url>

# Example
git remote add origin https://github.com/user/project.git

  • origin: The default name Git gives to the server you cloned from.

Cloning

Copying an existing remote repository to your local machine.

BASH
git clone https://github.com/user/project.git

This downloads the entire history and creates a working directory.

Pushing Changes

Sending your committed changes to the remote server.

BASH
# Push the 'main' branch to 'origin'
git push origin main

Fetching vs. Pulling

  • Fetch: Downloads new data from the remote repository but does not integrate it into your working files. It updates your remote-tracking branches (e.g., origin/main).
    BASH
        git fetch origin
        
  • Pull: Fetches the data and immediately merges it into your current local branch.
    BASH
        git pull origin main
        

    Equation: git pull = git fetch + git merge

6. Git Workflow

A Git workflow is a recipe or recommendation for how to use Git to accomplish work in a consistent and productive manner.

1. Centralized Workflow

  • Mimics SVN.
  • Everyone pushes to the main (or master) branch.
  • Pros: Simple for small teams.
  • Cons: High risk of conflicts; no code review process.

2. Feature Branch Workflow (Industry Standard)

  • The main branch represents the official project history.
  • Developers create a new branch for every new feature or bug fix.
  • Process:
    1. Create branch feature-x from main.
    2. Work, stage, and commit on feature-x.
    3. Push feature-x to remote.
    4. Open a Pull Request (PR) to merge feature-x into main.
    5. Review code, resolve conflicts, and merge.
  • Pros: Keeps main clean; enables code review; isolates features.

3. Gitflow Workflow

A strict branching model designed for project releases. It uses specific branch roles:

  • Main: Production-ready code.
  • Develop: Integration branch for features.
  • Feature: Created from Develop, merged back to Develop.
  • Release: Created from Develop to prepare for production.
  • Hotfix: Created from Main to fix urgent production bugs.

4. Forking Workflow

  • Used in Open Source projects.
  • A developer does not have write access to the main repository.
  • They fork (copy) the repo to their own GitHub account.
  • They clone their fork, make changes, and push to their fork.
  • They create a Pull Request from their fork to the original repository.