Unit 1 - Notes

INT332 8 min read

Unit 1: Basics of DevOps Infrastructure

Introduction to Containers

Origin of Containers

The concept of containerization did not begin with modern tools like Docker; its roots trace back decades in Unix-like operating systems:

  • 1979 - Unix V7 chroot: The earliest form of isolation, chroot changed the apparent root directory for a running process and its children. It provided basic filesystem isolation but lacked process and network isolation.
  • 2000 - FreeBSD Jails: Introduced a more robust isolation mechanism. Jails partitioned a FreeBSD system into several independent mini-systems sharing the same kernel, with individual IP addresses and isolated software installations.
  • 2004 - Solaris Zones: Brought full application containment to Solaris systems, allowing administrators to partition resources and isolate applications completely.
  • 2008 - Linux Containers (LXC): The direct predecessor to modern containers. LXC combined Linux namespaces (for process isolation) and cgroups (for resource management) to run multiple isolated Linux environments (containers) on a single control host.

Emergence of Modern Containerization

While LXC provided the underlying technology, it was complex to configure and manage. In 2013, Docker was introduced, revolutionizing the landscape. Docker didn't invent containers; rather, it democratized them by:

  • Introducing a standard packaging format (the Container Image).
  • Providing an easy-to-use Command Line Interface (CLI).
  • Creating a robust ecosystem for sharing images (Docker Hub).
    Modern containerization abstracts the application from the underlying OS host infrastructure, making software highly portable across different computing environments.

Integration into DevOps

Containers are now a foundational pillar of DevOps practices. They solve the classic "it works on my machine" problem by ensuring consistency across Development, Testing, and Production environments.

  • Continuous Integration/Continuous Deployment (CI/CD): Containers spin up quickly, making them ideal for automated testing and deployment pipelines.
  • Microservices Architecture: Containers provide the perfect lightweight encapsulation for microservices, allowing individual components of an application to be deployed, scaled, and updated independently.
  • Infrastructure as Code (IaC): Container definitions (like Dockerfiles) are stored in version control alongside application code.

The Core Mechanisms of Containerization

Containers rely on three primary Linux kernel features: Container Runtimes, Namespaces, and Control Groups.

Container Runtime

A container runtime is the software component responsible for executing containers and managing container images on a node. Runtimes are generally split into two categories:

  1. Low-Level Runtimes: Responsible solely for interacting with the OS kernel to create and run the container (e.g., runc). They set up namespaces and cgroups.
  2. High-Level Runtimes: Manage image pulling, unpacking, and API management before handing off the execution to the low-level runtime (e.g., containerd, CRI-O).

Process Isolation & Namespaces

If a container is a "sandbox," namespaces form the walls of that sandbox. Namespaces are a Linux kernel feature that partitions kernel resources such that one set of processes sees one set of resources, while another set of processes sees a different set.
Key Linux Namespaces include:

  • PID (Process ID): Isolates the process ID number space. A process inside a container can be PID 1, completely unaware of processes on the host.
  • NET (Network): Isolates network interfaces, IP addresses, routing tables, and port numbers.
  • MNT (Mount): Isolates filesystem mount points. The container sees only its own filesystem.
  • IPC (Inter-Process Communication): Isolates POSIX message queues and shared memory.
  • UTS (UNIX Timesharing System): Isolates the hostname and domain name.
  • USER: Maps users and group IDs inside the container to different users and group IDs on the host (e.g., root inside the container is a non-root user on the host).

Control Groups (cgroups) for Resource Limits

If namespaces limit what a process can see, cgroups limit what a process can use. Control Groups govern resource allocation and metering.

  • CPU: Limits the percentage of CPU cycles a container can consume.
  • Memory: Restricts the maximum amount of RAM and Swap space a container can use, preventing a single container from starving the host OS (Out-Of-Memory prevention).
  • I/O (Block I/O): Throttles read/write speeds to block storage devices.
  • PIDs: Limits the maximum number of processes a container can spawn, preventing fork-bomb attacks.

Images and Distribution

Container Images & Layers

A container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries, and settings.

  • Immutability: Images are read-only templates. Once built, they cannot be changed. To modify an application, a new image must be built.
  • Layers: Images are composed of multiple stacked layers. Each instruction in a configuration file (like a Dockerfile) creates a new layer. If multiple images share the same base layer (e.g., Ubuntu 20.04), the host system only downloads and stores that base layer once, heavily optimizing storage and network transfer.

Image Registries & Distribution

An Image Registry is a storage and distribution system for container images.

  • Distribution Mechanism: Images are distributed using a push/pull mechanism. Developers push built images to a registry, and host servers pull them down to execute as containers.
  • Public vs. Private: Registries can be public (like Docker Hub, Amazon ECR Public) or private (hosted internally to secure proprietary code).
  • OCI Standard: The Open Container Initiative (OCI) establishes standards for container formats and runtimes, ensuring that an image pushed to a registry can be pulled and run by any OCI-compliant runtime (Docker, Podman, containerd).

Introduction to Docker

Docker is an open-source platform designed to automate the deployment, scaling, and management of applications using containerization. Released by Docker, Inc. in 2013, it abstracted the complex low-level Linux kernel features (namespaces, cgroups) into an intuitive, user-friendly API and CLI tool. Docker shifted the industry paradigm from virtualizing hardware (VMs) to virtualizing the operating system.


Docker Architecture

Docker uses a Client-Server architecture. The Docker client talks to the Docker daemon, which does the heavy lifting of building, running, and distributing containers.

1. Docker Daemon (dockerd)

The background service running on the host machine. It listens for Docker API requests and manages Docker objects such as images, containers, networks, and volumes. A daemon can also communicate with other daemons to manage swarm services.

2. Docker CLI (Client)

The primary interface used by Docker users (docker). When you type commands such as docker run, the client sends these commands to dockerd via a REST API (over UNIX sockets or a network interface), which then carries them out.

3. Docker Registry & Docker Hub

As mentioned, a registry stores Docker images.

  • Docker Hub is the default public registry managed by Docker.
  • When a docker pull or docker run command is executed, the required images are pulled from the configured registry.
  • When docker push is used, the image is stored in the registry.

Docker Object Types

When working with Docker, you interact with several core objects:

1. Container

A runnable instance of an image. It is an isolated, secure application platform. You can create, start, stop, move, or delete a container using the Docker API or CLI. By default, a container is isolated from other containers and its host machine.

2. Image

A read-only template with instructions for creating a Docker container. Often, an image is based on another image, with some additional customization (e.g., you may build an image based on the nginx image, but add your own custom HTML files).

3. Network

Docker networks enable complete isolation for containers. They define how containers communicate with each other and the outside world.

  • Bridge: The default network driver. Isolates containers from the host but allows containers on the same bridge to communicate.
  • Host: Removes network isolation between the container and the Docker host (container uses the host's networking).
  • None: Disables all networking for the container.
  • Overlay: Connects multiple Docker daemons together and enables swarm services to communicate.

4. Volume

Volumes are the preferred mechanism for persisting data generated by and used by Docker containers. Since containers are ephemeral (data written inside them is lost when they are deleted), volumes exist entirely outside the container's lifecycle.

  • They are stored in a part of the host filesystem managed specifically by Docker (/var/lib/docker/volumes/).
  • They can be safely shared among multiple containers simultaneously.

Docker Layering & Filesystem

Docker utilizes a specialized filesystem architecture to ensure efficiency and speed.

Union Filesystem (UnionFS)

Docker leverages Union Filesystems (most commonly the overlay2 storage driver in modern Linux distributions). UnionFS allows files and directories of separate filesystems (known as branches) to be transparently overlaid, forming a single coherent filesystem.

Read-Only vs. Read-Write Layers

  • Image Layers (Read-Only): Every layer that makes up a Docker image is strictly read-only. This allows the layers to be safely shared among multiple containers on the same host without conflicts.
  • Container Layer (Read-Write): When Docker launches a container from an image, it adds a thin, empty Read-Write layer (the "container layer") on top of the underlying read-only image layers. All writes, updates, and file deletions made by the running container are stored in this thin layer.

Copy-on-Write (CoW) Strategy

To maximize efficiency, Docker uses a Copy-on-Write strategy.

  • If a container needs to read a file, it reads it directly from the read-only image layers.
  • If a container needs to modify a file that exists in lower, read-only layers, Docker copies the file up into the top read-write container layer before making the modification.
  • The original read-only file remains unchanged for any other containers using that same image.

This architecture allows Docker to start containers in mere milliseconds, as it does not need to copy entire filesystems to launch a new container instance.