Unit 3 - Notes

INT363 8 min read

Unit 3: Deploying Microservices

1. Containerization with Docker

1.1 Introduction to Containerization

Containerization is a form of operating system virtualization. Instead of simulating hardware (like Virtual Machines), containers sit on top of a physical server and its host OS. Each container shares the host OS kernel and, usually, the binaries and libraries, too.

Isolation: Containers are isolated from one another and from the host system.
Portability: A container includes everything needed to run an application (code, runtime, system tools, system libraries, and settings), ensuring it works uniformly across development, testing, and production environments.
Efficiency: Containers are lightweight and start almost instantly compared to VMs, which require a full OS boot.

1.2 Virtual Machines (VMs) vs. Containers

Feature	Virtual Machines	Containers
Architecture	Hypervisor-based (Hardware Virtualization)	OS-level Virtualization
OS	Each VM has its own Guest OS	Share the Host OS Kernel
Size	Heavyweight (Gigabytes)	Lightweight (Megabytes)
Boot Time	Minutes	Seconds/Milliseconds
Performance	Lower (due to overhead)	Native performance

1.3 Docker Core Concepts

Docker is the standard platform for developing, shipping, and running applications in containers.

Docker Engine: The core software that hosts the containers.
Docker Daemon (dockerd): A background service that manages Docker objects (images, containers, networks, volumes).
Docker Client (docker): The Command Line Interface (CLI) used by users to interact with the Docker Daemon.
Docker Registry: A storage location for Docker images (e.g., Docker Hub, AWS ECR).

2. Dockerfile and Container Image Creation

2.1 The Docker Image

A Docker image is a read-only template that contains a set of instructions for creating a container that can run on the Docker platform. It is built from a series of layers.

2.2 The Dockerfile

A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image.

Key Dockerfile Instructions

FROM: Initializes a new build stage and sets the Base Image (e.g., FROM python:3.9).
WORKDIR: Sets the working directory for any subsequent RUN, CMD, ENTRYPOINT, COPY and ADD instructions.
COPY: Copies new files or directories from the source to the filesystem of the container.
RUN: Executes any commands in a new layer on top of the current image and commits the results (used for installing packages).
EXPOSE: Informs Docker that the container listens on the specified network ports at runtime.
CMD: Provides defaults for an executing container. There can only be one CMD instruction in a Dockerfile.

2.3 Example: Dockerizing a Node.js Microservice

1. Create the Dockerfile:

DOCKERFILE

# Step 1: Specify the base image
FROM node:14-alpine

# Step 2: Set the working directory inside the container
WORKDIR /usr/src/app

# Step 3: Copy package files to install dependencies
COPY package*.json ./

# Step 4: Install dependencies
RUN npm install

# Step 5: Copy the rest of the application code
COPY . .

# Step 6: Expose the port the app runs on
EXPOSE 8080

# Step 7: Define the command to run the app
CMD ["node", "app.js"]

2. Build the Image:
Command to build the image from the Dockerfile in the current directory (.), tagging it as my-microservice:

BASH

docker build -t my-microservice .

3. Run the Container:
Command to run the container, mapping host port 3000 to container port 8080:

BASH

docker run -d -p 3000:8080 --name service-instance my-microservice

2.4 Image Layering and Caching

Docker uses a union file system. Each instruction in a Dockerfile creates a layer. When rebuilding an image, Docker reuses existing layers from the cache if the instruction hasn't changed. This makes builds significantly faster.

3. Container Orchestration

3.1 Definition

Container orchestration is the automated management of container deployments, scaling, networking, and lifecycle. While Docker manages a single container, orchestration manages clusters of containers across multiple hosts.

3.2 Why Orchestration is Needed

Running microservices in production involves hundreds or thousands of containers. Managing them manually is impossible due to:

High Availability: If a container crashes, it must be restarted immediately.
Scaling: If traffic spikes, new instances must be spun up; if traffic drops, they must be removed.
Networking: Containers need to discover and communicate with each other across different hosts.
Resource Allocation: Placing containers on servers with adequate CPU/RAM.

3.3 Common Orchestration Tools

Kubernetes (K8s): The industry standard (Google-born, CNCF maintained).
Docker Swarm: Native clustering for Docker (simpler, but less feature-rich than K8s).
Apache Mesos: An abstraction of CPU, memory, storage, and other compute resources.

4. Overview of Kubernetes and Architecture

4.1 What is Kubernetes (K8s)?

Kubernetes is an open-source platform for automating deployment, scaling, and operations of application containers across clusters of hosts.

4.2 Kubernetes Architecture

A Kubernetes cluster consists of two main types of resources: The Control Plane (Master) and Worker Nodes.

A. The Control Plane (Master Node)

The brain of the cluster. It makes global decisions (e.g., scheduling) and detects/responds to cluster events.

API Server (kube-apiserver): The front end of the Kubernetes control plane. It exposes the Kubernetes API. All tools (kubectl, dashboard) communicate via this component.
etcd: A consistent and highly-available key-value store used as Kubernetes' backing store for all cluster data. It is the "source of truth."
Scheduler (kube-scheduler): Watches for newly created Pods with no assigned node and selects a node for them to run on based on resource requirements and constraints.
Controller Manager (kube-controller-manager): Runs controller processes (e.g., Node Controller, ReplicaSet Controller) that regulate the state of the system, trying to move the current state toward the desired state.

B. The Worker Nodes

The machines (VMs or physical servers) that run the applications.

Kubelet: An agent that runs on each node. It ensures that containers are running in a Pod. It communicates with the API server.
Kube-proxy: Maintains network rules on nodes. It enables network communication to your Pods from network sessions inside or outside of your cluster.
Container Runtime: The software responsible for running containers (e.g., Docker, containerd, CRI-O).

4.3 Key Kubernetes Objects

Pod: The smallest deployable unit. A Pod represents a single instance of a running process in your cluster. It usually contains one container (sometimes helper containers/sidecars) and shares storage/network.
Service: An abstraction that defines a logical set of Pods and a policy by which to access them (Load Balancing). It provides a stable IP address.
ReplicaSet: Ensures a specified number of pod replicas are running at any given time.
Deployment: A higher-level abstraction that manages ReplicaSets and provides declarative updates (e.g., rolling updates) to Pods.
Namespace: Provides a mechanism for isolating groups of resources within a single cluster.

5. Deploying Microservices

5.1 Deployment Strategies

When updating microservices, downtime must be minimized.

Rolling Update (Default in K8s):
- Instances of the new version replace instances of the old version one by one.
- Pros: No downtime, slow rollout allows monitoring.
- Cons: Both versions run simultaneously for a period.
Blue/Green Deployment:
- Two identical environments (Blue = old/live, Green = new).
- Traffic is switched strictly from Blue to Green once Green is verified.
- Pros: Instant rollback, easy testing.
- Cons: Requires double the resources (costly).
Canary Deployment:
- A small percentage of traffic (e.g., 10%) is routed to the new version (the canary).
- If metrics are healthy, the percentage increases until 100% traffic is moved.
- Pros: Lowest risk, impacts few users if bugs exist.

5.2 Service Discovery

In microservices, containers are ephemeral (IPs change on restart).

Client-Side Discovery: The client queries a Service Registry (like Netflix Eureka) to get the address of a service instance.
Server-Side Discovery: The client calls a Load Balancer (like Kubernetes Service or AWS ELB), which queries the registry and routes the traffic. Kubernetes handles this natively via DNS and Services.

5.3 Configuration Management

Externalized Configuration: Never hardcode configs (DB URLs, API keys) in the container image.
Environment Variables: Pass configs at runtime.
Kubernetes ConfigMaps & Secrets: Store non-sensitive data in ConfigMaps and sensitive data (passwords) in Secrets, mounted as files or environment variables in the Pod.

6. Continuous Integration (CI) Principles

6.1 Definition

Continuous Integration (CI) is a development practice where developers integrate code into a shared repository frequently (preferably several times a day). Each integration can then be verified by an automated build and automated tests.

6.2 Key CI Principles

Maintain a Single Source Repository: Use version control (Git) to track all code, configuration, and scripts.
Automate the Build: The build process (compiling, linking, packaging) should happen via a single command or trigger.
Make the Build Self-Testing: Once built, the code must be tested automatically (Unit tests, Integration tests). If tests fail, the build fails.
Commit Early, Commit Often: Reduces merge conflicts and makes it easier to isolate errors.
Every Commit Should Build the Mainline: The Continuous Integration server (e.g., Jenkins, GitLab CI, GitHub Actions) monitors the repo and builds every commit.
Fix Broken Builds Immediately: If the main branch breaks, fixing it is the highest priority.

6.3 Benefits of CI in Microservices

Early Bug Detection: Failures are caught minutes after code is committed, not weeks later.
Reduced Integration Hell: Frequent merges prevent the "merge hell" that occurs right before a release.
Feedback Loop: Developers get immediate feedback on the quality and compatibility of their code.
Deployment Readiness: Since the code is constantly built and tested, the software is always in a deployable state.

6.4 The CI/CD Pipeline Workflow

Code: Developer commits code to Git.
Trigger: CI Server detects changes.
Build: CI Server compiles code and builds Docker Image.
Test: CI Server runs unit and integration tests.
Push: If successful, the Docker Image is pushed to the Container Registry.
Deploy (CD): The orchestration tool (Kubernetes) pulls the new image and updates the microservice.

Unit 2

Unit 4