Unit 4 - Notes
INT331
Unit 4: Advanced Git
1. Source Code Management (SCM) with Git
Git is a Distributed Version Control System (DVCS) that allows multiple developers to work on a project simultaneously without overwriting each other's changes. Unlike Centralized VCS (like SVN), every developer has a full copy of the repository history on their local machine.
Core Data Structure: Snapshots, Not Differences
While traditional SCM systems store data as a list of file-based changes (deltas), Git thinks of its data as a series of snapshots of a miniature filesystem.
- Every time you commit, Git takes a picture of what all your files look like at that moment and stores a reference to that snapshot.
- To be efficient, if files have not changed, Git does not store the file again, just a link to the previous identical file.
The Three States
Git manages files in three distinct states:
- Modified: You have changed the file but have not committed it to your database yet.
- Staged: You have marked a modified file in its current version to go into your next commit snapshot.
- Committed: The data is safely stored in your local database.
The Git Lifecycle
- Working Directory: The actual files you are currently editing.
- Staging Area (Index): A file that stores information about what will go into your next commit.
- Git Directory (.git): Where Git stores the metadata and object database.
2. Comparison Commands
Effective SCM requires the ability to audit changes at a granular level before they are permanently recorded in history.
git diff
The git diff command is used to calculate the difference between data sources (commits, branches, files, working directory, etc.).
- Working Directory vs. Staging Area:
BASHgit diff # Shows changes made that are NOT yet staged. - Staging Area vs. Last Commit (HEAD):
BASHgit diff --staged # (Or git diff --cached). Shows what you are about to commit. - Comparison between two commits:
BASHgit diff <commit-hash-1> <commit-hash-2> - Comparison between two branches:
BASHgit diff branch_A..branch_B
git show
Used to view the expanded details of a specific Git object (blob, tree, tag, or commit).
- Usage:
BASHgit show <commit-hash>- Displays the log message, author, date, and the diff (patch) of what changed in that specific commit.
git log
While primarily a history viewer, git log is essential for comparison when used with filters.
- Show patch changes with history:
BASHgit log -p - Show stats (lines added/deleted):
BASHgit log --stat
3. Branching and Merging
Branching is Git’s "killer feature," allowing divergence from the main line of development to work on unrelated tasks without messing up the existing code.
Branching Mechanics
A branch in Git is simply a lightweight movable pointer to one of these commits. The default branch name is usually master or main.
- HEAD: A special pointer that indicates the local branch you are currently working on.
Commands:
- Create a branch:
git branch <name> - Switch to a branch:
git checkout <name>orgit switch <name> - Create and switch:
git checkout -b <name>
Merging
Merging brings the history of forked branches back together.
1. Fast-Forward Merge
Occurs when the current branch tip is a direct ancestor of the target branch.
- Scenario: You branch off
main, make commits, andmainhas not changed since. - Action: Git simply moves the pointer forward. No new commit is created.
2. Three-Way Merge (Recursive)
Occurs when the branches have diverged (both main and the feature branch have new, different commits).
- Action: Git finds the "common ancestor" of the two branch tips. It creates a new Merge Commit that has two parents (the tip of both branches).
Conflict Resolution
If the same part of the same file is modified differently in the two branches being merged, Git cannot automatically merge them. This results in a Merge Conflict.
- Git pauses the merge.
- Files with conflicts are marked with conflict markers:
TEXT<<<<<<< HEAD var x = 10; ======= var x = 20; >>>>>>> feature-branch - Resolution: The developer must manually edit the file to choose the correct code, delete markers, stage the file (
git add), and complete the commit.
4. Rebasing
Rebasing is an alternative to merging for integrating changes. It rewrites the commit history to produce a straight, linear succession of commits.
How Rebasing Works
It takes all the changes that were committed on one branch and replays them on top of another branch.
Command:
git checkout feature
git rebase main
- Git finds the common ancestor of
featureandmain. - It saves the diffs (changes) introduced in each commit of the
featurebranch to temporary files. - It resets the current branch to the same commit as
main. - It applies each change one by one on top of
main.
Interactive Rebasing (-i)
A powerful tool for cleaning up history before sharing it. It allows you to alter commits as they are moved.
git rebase -i HEAD~3
This opens an editor permitting you to:
- Pick: Keep the commit.
- Reword: Change the commit message.
- Edit: Pause the rebase to modify the content of the commit.
- Squash: Combine this commit into the previous one (useful for hiding "wip" or typo fix commits).
- Drop: Delete the commit entirely.
Rebase vs. Merge
- Merge: Preserves history exactly as it happened. Good for traceability. Can result in a messy history graph ("railroad tracks").
- Rebase: Creates a clean, linear history. "Rewrites" history.
- The Golden Rule of Rebasing: Never rebase commits that exist outside your repository (public/shared commits). Since rebasing changes the commit Hash ID, it will cause chaos for other developers working on that branch.
5. Stashing
Stashing allows you to temporarily shelve (store) changes you've made to your working copy so you can work on something else, and then come back and re-apply them later.
Use Case: You are working on feature-A, but a critical bug report comes in. You aren't ready to commit your half-done work on feature-A, but you need a clean working directory to fix the bug.
Stashing Commands
-
Save changes:
BASHgit stash # OR with a message git stash save "work on login logic"
This reverts the working directory to the last commit (HEAD), but saves the modifications in a stack. -
View Stash List:
BASHgit stash list # Output: stash@{0}: On master: work on login logic -
Apply Stash:
git stash apply: Re-applies changes but keeps the stash in the stack (useful if applying to multiple branches).git stash pop: Re-applies changes and removes them from the stack.
-
Drop Stash:
BASHgit stash drop stash@{0} -
Stashing Untracked Files:
By default, stash only stores tracked files. To include new files:
BASHgit stash -u
6. Tagging
Tagging is used to mark specific points in a repository's history as being important. Typically, this is used to mark release points (e.g., v1.0, v2.0).
Types of Tags
1. Lightweight Tags
A lightweight tag is very much like a branch that doesn’t change—it’s just a pointer to a specific commit.
- Creation:
BASHgit tag v1.0-lw
2. Annotated Tags
Annotated tags are stored as full objects in the Git database. They contain the tagger name, email, date, have a tagging message, and can be signed and verified with GNU Privacy Guard (GPG).
- Creation:
BASHgit tag -a v1.0 -m "My version 1.0 release"
Tagging Operations
-
Listing Tags:
BASHgit tag # or with wildcards git tag -l "v1.8*" -
Tagging Later:
You can tag a commit from the past by specifying the commit hash (checksum).
BASHgit tag -a v1.2 9fceb02 -
Sharing Tags:
By default,git pushdoes not transfer tags to remote servers. They must be pushed explicitly.
BASHgit push origin v1.5 # Push all tags git push origin --tags -
Checking out Tags:
You cannot check out a tag and modify it directly (this puts you in a "detached HEAD" state). To work from a tag, create a branch from it:
BASHgit checkout -b version2 v2.0.0