Unit6 - Subjective Questions
CSC202 • Practice Questions with Detailed Answers
Define Infrastructure as Code (IaC) and list its primary benefits in modern system administration.
Infrastructure as Code (IaC) is the practice of managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools.
Primary Benefits:
- Consistency: Eliminates configuration drift by ensuring the same environment is deployed every time.
- Speed and Efficiency: Automated provisioning is significantly faster than manual processes.
- Version Control: Infrastructure definitions can be stored in version control systems (like Git), allowing for history tracking and rollback.
- Scalability: Allows for easy replication of infrastructure across multiple environments (Dev, Test, Prod).
- Documentation: The code itself serves as documentation for the state of the infrastructure.
Distinguish between Declarative and Imperative approaches to Infrastructure as Code.
The two main approaches to IaC differ in how the desired state is achieved:
1. Declarative Approach (Focus on 'What'):
- Definition: The user defines the desired state of the system (e.g., "I want 3 web servers"), and the tool figures out how to achieve it.
- Mechanism: The tool compares the current state with the desired state and applies only the necessary changes.
- Examples: Terraform, Kubernetes manifests, Ansible (mostly).
- Advantage: Easier to manage and understand the end goal; idempotent by nature.
2. Imperative Approach (Focus on 'How'):
- Definition: The user defines the specific steps or commands to execute to achieve the result (e.g., "Run script A, then install package B").
- Mechanism: The tool executes the list of instructions exactly as written.
- Examples: Bash scripts, Chef (procedural style).
- Advantage: Greater control over the exact execution flow, but requires more maintenance to handle error states.
Explain the concept of Idempotency in the context of configuration management.
Idempotency is a mathematical property used in computing where an operation can be applied multiple times without changing the result beyond the initial application.
In System Administration/IaC:
- It ensures that running a configuration script multiple times produces the same result as running it once.
- Example: If an Ansible task says "Ensure Apache is installed," running it on a server where Apache is already installed will do nothing. If Apache is missing, it installs it.
- Importance: It prevents side effects (like appending the same line to a config file 10 times) and ensures stability during automated deployments.
Describe the three main stages of a file in Git version control.
Git manages files across three specific areas or "trees":
-
Working Directory:
- This is the local directory where files are created, edited, and deleted. Changes here are considered "untracked" or "modified" until staged.
-
Staging Area (Index):
- This is an intermediate area where changes are prepared before being committed. Files are moved here using the
git addcommand. It allows the user to curate exactly what goes into the next snapshot.
- This is an intermediate area where changes are prepared before being committed. Files are moved here using the
-
Repository (HEAD):
- This is the database where Git permanently stores the metadata and object database. Files are moved here using the
git commitcommand. This represents the saved history of the project.
- This is the database where Git permanently stores the metadata and object database. Files are moved here using the
Explain the difference between git merge and git rebase.
Both commands are used to integrate changes from one branch into another, but they modify the commit history differently:
Git Merge:
- Function: Combines the histories of two branches.
- Result: Creates a new "merge commit" that ties the two histories together. It preserves the exact chronological history of events.
- Pros: Non-destructive; maintains the context of the branch history.
- Cons: History can become cluttered with merge commits.
Git Rebase:
- Function: Moves the entire branch to begin on the tip of the master branch (essentially rewriting history).
- Result: Creates a linear history without extra merge commits. It looks as if the work was created sequentially.
- Pros: Clean, linear project history.
- Cons: Destructive operation (rewrites commit hashes); dangerous if used on shared public branches.
What is a Hypervisor? Compare Type 1 and Type 2 Hypervisors.
A Hypervisor (or Virtual Machine Monitor - VMM) is software, firmware, or hardware that creates and runs virtual machines (VMs).
Type 1 (Bare Metal):
- Installation: Installed directly on the physical hardware.
- Architecture: No underlying host Operating System. The hypervisor controls the hardware directly.
- Performance: High performance and stability.
- Use Case: Enterprise data centers (e.g., VMware ESXi, Microsoft Hyper-V, Xen).
Type 2 (Hosted):
- Installation: Installed as a software application on top of an existing Host OS (Windows, Linux, macOS).
- Architecture: Relies on the Host OS for hardware management.
- Performance: Higher overhead due to the extra OS layer.
- Use Case: Desktop virtualization, testing, development (e.g., Oracle VirtualBox, VMware Workstation).
Compare and contrast Virtual Machines (VMs) and Containers.
Both are virtualization technologies but operate at different abstraction layers.
| Feature | Virtual Machines (VM) | Containers |
|---|---|---|
| Abstraction Level | Hardware Virtualization | OS-Level Virtualization |
| Architecture | Runs a full Guest OS on a Hypervisor | Shares the Host OS Kernel |
| Size | Heavyweight (Gigabytes) | Lightweight (Megabytes) |
| Boot Time | Slow (Minutes) | Fast (Seconds or milliseconds) |
| Isolation | Strong isolation (hardware level) | Process isolation (namespace level) |
| Portability | Less portable (depends on hypervisor) | Highly portable (runs anywhere engine runs) |
Explain the role of Namespaces and Cgroups (Control Groups) in Linux containerization.
Containers rely on two key Linux kernel features to function:
1. Namespaces (Isolation):
- Namespaces partition kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set.
- Types:
- PID Namespace: Provides independent process IDs.
- NET Namespace: Provides independent network stack (IPs, ports).
- MNT Namespace: Provides independent file system mount points.
2. Control Groups (Cgroups) (Resource Management):
- Cgroups limit, account for, and isolate the resource usage (CPU, memory, disk I/O, network) of a collection of processes.
- They ensure that a single container cannot exhaust the resources of the host machine, starving other containers.
Describe the standard lifecycle of a Docker container.
The lifecycle of a container involves several states controlled by the container engine:
- Created: The container has been created from an image but is not yet running.
- Running: The process defined in the container is actively executing. (Command:
docker startordocker run). - Paused: The processes within the container are suspended (frozen) but remain in memory.
- Stopped/Exited: The main process inside the container has completed or has been killed. The container file system persists.
- Deleted: The container and its writeable layer are removed from the system (Command:
docker rm).
What is a Dockerfile? Explain the purpose of FROM, RUN, COPY, and CMD instructions.
A Dockerfile is a text document containing all the commands a user could call on the command line to assemble an image.
Key Instructions:
- FROM: Initializes a new build stage and sets the Base Image (e.g.,
FROM ubuntu:20.04). It is usually the first line. - RUN: Executes any commands in a new layer on top of the current image and commits the results (e.g., installing packages). Used during the build phase.
- COPY: Copies new files or directories from the host source path to the container filesystem.
- CMD: Provides the default command and/or parameters to be executed when a container is started from the image. Unlike RUN, this happens at runtime.
Explain the concept of Container Images and the Union File System.
Container Images:
An image is a read-only template with instructions for creating a Docker container. It contains the application code, libraries, dependencies, tools, and other files needed to run an application.
Union File System (UnionFS):
- Images are built using layers. Each instruction in a Dockerfile creates a layer.
- UnionFS allows files from separate file systems (layers) to be overlaid transparently to form a single coherent file system.
- Efficiency: Layers are cached and shared. If multiple images use the same base layer (e.g., Ubuntu), that layer is stored only once on the disk, saving space.
- Copy-on-Write: When a container starts, a thin read/write layer is added on top. If a file from a lower read-only layer needs modification, it is copied up to the writeable layer first.
Discuss the challenges of data persistence in containers and how Volumes solve this.
The Challenge:
By default, containers are ephemeral. Any data created inside the container's writable layer is stored effectively in a temporary filesystem. When the container is deleted, that data is lost forever. This is problematic for databases or applications requiring persistent logs.
The Solution: Volumes:
- Volumes are directories (or files) that are stored outside the container's Union File System, usually directly on the host machine.
- Benefits:
- Data persists even if the container is deleted.
- Volumes can be shared safely among multiple containers.
- Volumes are easy to back up or migrate.
- I/O performance is generally higher than writing to the container's writable layer.
What is Container Orchestration? Why is it necessary in a production environment?
Container Orchestration is the automated management of the lifecycle of containers, including deployment, scaling, and networking.
Necessity in Production:
While managing 5 containers manually is possible, enterprise applications often run hundreds or thousands of containers. Orchestration tools (like Kubernetes or Docker Swarm) are needed to:
- Scaling: Automatically scale the number of container instances up or down based on traffic (CPU/RAM usage).
- High Availability: Restart failed containers or reschedule them to healthy nodes if hardware fails.
- Load Balancing: Distribute network traffic across multiple container instances.
- Rolling Updates: Update applications with zero downtime.
Explain the significance of .gitignore in a Git repository.
The .gitignore file is a text file where each line contains a pattern for files/directories that Git should ignore.
Significance:
- Cleanliness: It prevents the repository from being cluttered with unnecessary files (build artifacts, temporary files).
- Security: It prevents sensitive data (API keys, passwords, environment config files like
.env) from accidentally being committed to the public repository. - Size Management: It excludes large binary files or dependencies (like
node_modules/orvendor/) that can be re-downloaded via package managers, keeping the repo size small.
Describe the different network modes available in Docker.
Docker provides several network drivers to control how containers communicate:
- Bridge (default): Creates a private internal network on the host. Containers on this bridge can communicate with each other via IP. Port mapping is required to access them from outside.
- Host: The container removes network isolation and uses the host's networking namespace directly. If the container listens on port 80, it is effectively the host's port 80.
- None: The container has no network interface (except loopback). Used for batch jobs requiring no network.
- Overlay: Facilitates communication between containers running on different Docker hosts (used in Swarm/Kubernetes clusters).
Explain the Push vs. Pull models in Infrastructure as Code configuration management.
These models define how configuration updates reach the servers:
Push Model (e.g., Ansible, Terraform):
- Mechanism: A central control server (or the admin's laptop) initiates the connection to the destination servers and "pushes" the configuration or commands to them.
- Pros: Immediate control; no agent software required on the nodes (agentless).
- Cons: The control node must have access/credentials for all servers.
Pull Model (e.g., Puppet, Chef):
- Mechanism: Agents installed on the destination servers periodically check in (poll) with a central master server to see if there are configuration updates. If yes, they "pull" and apply them.
- Pros: Scales better for massive fleets; better for machines with dynamic IPs; automatic drift correction.
- Cons: Requires installing and managing agents on every node.
What is Paravirtualization? How does it differ from Full Virtualization?
Paravirtualization:
A virtualization technique where the Guest Operating System is modified (aware that it is being virtualized) to communicate directly with the Hypervisor via hypercalls.
Differences:
- Full Virtualization: The Guest OS is unmodified and "thinks" it is running on real hardware. The Hypervisor must translate (trap-and-emulate) binary instructions, which is computationally expensive.
- Paravirtualization: Because the Guest OS cooperates with the Hypervisor, it skips the translation overhead for critical operations (like memory and I/O), resulting in better performance (near-native speeds).
- Constraint: Requires a modified kernel for the Guest OS (easier with Linux, historically harder with Windows).
Define a Container Registry and explain the difference between public and private registries.
A Container Registry is a centralized repository service used to store and distribute container images (like a library for images).
Function: Users push built images to the registry and pull them down to servers to deploy containers.
Types:
- Public Registry: Open to everyone. The most famous is Docker Hub. It contains official images for OSs (Alpine, Ubuntu) and software (Nginx, Node.js). Anyone can pull public images.
- Private Registry: Restricted access. Used by enterprises to store proprietary application images containing trade secrets or intellectual property. Access requires authentication (e.g., AWS ECR, Azure ACR, or a self-hosted local registry).
What are Git Hooks? Give examples of how they can be used in System Administration.
Git Hooks are scripts that Git executes before or after events such as: commit, push, and receive. They allow administrators to customize Git's internal behavior and trigger actions.
Use Cases in SysAdmin:
- Pre-commit Hook: Automatically run a syntax checker (linting) on IaC scripts (like Terraform or Ansible YAML) before allowing the commit. If the syntax is wrong, the commit is rejected.
- Post-receive Hook: Used in deployment. When code is pushed to a production server's repo, this hook can automatically checkout the code into the web directory and restart the web server service (CI/CD pipeline trigger).
Explain the concept of Branching Strategies in Git and why they are important for team collaboration.
A Branching Strategy is a set of rules that a development/ops team follows when interacting with Git branches to ensure code stability and manage concurrent work.
Importance:
- Prevents conflicts when multiple admins work on the same infrastructure code.
- Ensures that the
masterormainbranch is always in a deployable state.
Common Strategies:
- Git Flow: Uses strict branches (
develop,feature/*,release,master) for release management. - Feature Branch Workflow: Every new change (feature or fix) is created in a dedicated branch (e.g.,
feature/add-load-balancer). It is tested and then merged intomainvia a Pull Request. - Trunk-Based Development: Developers merge small, frequent updates to a core "trunk" (main) branch.