Unit 4 - Notes

INT331

Unit 4: Advanced Git

1. Source Code Management (SCM) with Git

Git is a Distributed Version Control System (DVCS) that allows multiple developers to work on a project simultaneously without overwriting each other's changes. Unlike Centralized VCS (like SVN), every developer has a full copy of the repository history on their local machine.

Core Data Structure: Snapshots, Not Differences

While traditional SCM systems store data as a list of file-based changes (deltas), Git thinks of its data as a series of snapshots of a miniature filesystem.

  • Every time you commit, Git takes a picture of what all your files look like at that moment and stores a reference to that snapshot.
  • To be efficient, if files have not changed, Git does not store the file again, just a link to the previous identical file.

The Three States

Git manages files in three distinct states:

  1. Modified: You have changed the file but have not committed it to your database yet.
  2. Staged: You have marked a modified file in its current version to go into your next commit snapshot.
  3. Committed: The data is safely stored in your local database.

The Git Lifecycle

  1. Working Directory: The actual files you are currently editing.
  2. Staging Area (Index): A file that stores information about what will go into your next commit.
  3. Git Directory (.git): Where Git stores the metadata and object database.

2. Comparison Commands

Effective SCM requires the ability to audit changes at a granular level before they are permanently recorded in history.

git diff

The git diff command is used to calculate the difference between data sources (commits, branches, files, working directory, etc.).

  • Working Directory vs. Staging Area:
    BASH
        git diff
        # Shows changes made that are NOT yet staged.
        
  • Staging Area vs. Last Commit (HEAD):
    BASH
        git diff --staged
        # (Or git diff --cached). Shows what you are about to commit.
        
  • Comparison between two commits:
    BASH
        git diff <commit-hash-1> <commit-hash-2>
        
  • Comparison between two branches:
    BASH
        git diff branch_A..branch_B
        

git show

Used to view the expanded details of a specific Git object (blob, tree, tag, or commit).

  • Usage:
    BASH
        git show <commit-hash>
        
    • Displays the log message, author, date, and the diff (patch) of what changed in that specific commit.

git log

While primarily a history viewer, git log is essential for comparison when used with filters.

  • Show patch changes with history:
    BASH
        git log -p
        
  • Show stats (lines added/deleted):
    BASH
        git log --stat
        

3. Branching and Merging

Branching is Git’s "killer feature," allowing divergence from the main line of development to work on unrelated tasks without messing up the existing code.

Branching Mechanics

A branch in Git is simply a lightweight movable pointer to one of these commits. The default branch name is usually master or main.

  • HEAD: A special pointer that indicates the local branch you are currently working on.

Commands:

  • Create a branch: git branch <name>
  • Switch to a branch: git checkout <name> or git switch <name>
  • Create and switch: git checkout -b <name>

Merging

Merging brings the history of forked branches back together.

1. Fast-Forward Merge

Occurs when the current branch tip is a direct ancestor of the target branch.

  • Scenario: You branch off main, make commits, and main has not changed since.
  • Action: Git simply moves the pointer forward. No new commit is created.

2. Three-Way Merge (Recursive)

Occurs when the branches have diverged (both main and the feature branch have new, different commits).

  • Action: Git finds the "common ancestor" of the two branch tips. It creates a new Merge Commit that has two parents (the tip of both branches).

Conflict Resolution

If the same part of the same file is modified differently in the two branches being merged, Git cannot automatically merge them. This results in a Merge Conflict.

  1. Git pauses the merge.
  2. Files with conflicts are marked with conflict markers:
    TEXT
        <<<<<<< HEAD
        var x = 10;
        =======
        var x = 20;
        >>>>>>> feature-branch
        
  3. Resolution: The developer must manually edit the file to choose the correct code, delete markers, stage the file (git add), and complete the commit.

4. Rebasing

Rebasing is an alternative to merging for integrating changes. It rewrites the commit history to produce a straight, linear succession of commits.

How Rebasing Works

It takes all the changes that were committed on one branch and replays them on top of another branch.

Command:

BASH
git checkout feature
git rebase main

  1. Git finds the common ancestor of feature and main.
  2. It saves the diffs (changes) introduced in each commit of the feature branch to temporary files.
  3. It resets the current branch to the same commit as main.
  4. It applies each change one by one on top of main.

Interactive Rebasing (-i)

A powerful tool for cleaning up history before sharing it. It allows you to alter commits as they are moved.

BASH
git rebase -i HEAD~3

This opens an editor permitting you to:

  • Pick: Keep the commit.
  • Reword: Change the commit message.
  • Edit: Pause the rebase to modify the content of the commit.
  • Squash: Combine this commit into the previous one (useful for hiding "wip" or typo fix commits).
  • Drop: Delete the commit entirely.

Rebase vs. Merge

  • Merge: Preserves history exactly as it happened. Good for traceability. Can result in a messy history graph ("railroad tracks").
  • Rebase: Creates a clean, linear history. "Rewrites" history.
  • The Golden Rule of Rebasing: Never rebase commits that exist outside your repository (public/shared commits). Since rebasing changes the commit Hash ID, it will cause chaos for other developers working on that branch.

5. Stashing

Stashing allows you to temporarily shelve (store) changes you've made to your working copy so you can work on something else, and then come back and re-apply them later.

Use Case: You are working on feature-A, but a critical bug report comes in. You aren't ready to commit your half-done work on feature-A, but you need a clean working directory to fix the bug.

Stashing Commands

  1. Save changes:

    BASH
        git stash
        # OR with a message
        git stash save "work on login logic"
        

    This reverts the working directory to the last commit (HEAD), but saves the modifications in a stack.

  2. View Stash List:

    BASH
        git stash list
        # Output: stash@{0}: On master: work on login logic
        

  3. Apply Stash:

    • git stash apply: Re-applies changes but keeps the stash in the stack (useful if applying to multiple branches).
    • git stash pop: Re-applies changes and removes them from the stack.
  4. Drop Stash:

    BASH
        git stash drop stash@{0}
        

  5. Stashing Untracked Files:
    By default, stash only stores tracked files. To include new files:

    BASH
        git stash -u
        


6. Tagging

Tagging is used to mark specific points in a repository's history as being important. Typically, this is used to mark release points (e.g., v1.0, v2.0).

Types of Tags

1. Lightweight Tags

A lightweight tag is very much like a branch that doesn’t change—it’s just a pointer to a specific commit.

  • Creation:
    BASH
        git tag v1.0-lw
        

2. Annotated Tags

Annotated tags are stored as full objects in the Git database. They contain the tagger name, email, date, have a tagging message, and can be signed and verified with GNU Privacy Guard (GPG).

  • Creation:
    BASH
        git tag -a v1.0 -m "My version 1.0 release"
        

Tagging Operations

  • Listing Tags:

    BASH
        git tag
        # or with wildcards
        git tag -l "v1.8*"
        

  • Tagging Later:
    You can tag a commit from the past by specifying the commit hash (checksum).

    BASH
        git tag -a v1.2 9fceb02
        

  • Sharing Tags:
    By default, git push does not transfer tags to remote servers. They must be pushed explicitly.

    BASH
        git push origin v1.5
        # Push all tags
        git push origin --tags
        

  • Checking out Tags:
    You cannot check out a tag and modify it directly (this puts you in a "detached HEAD" state). To work from a tag, create a branch from it:

    BASH
        git checkout -b version2 v2.0.0