Unit 5 - Notes

INT332 7 min read

Unit 5: Continuous Integration (CI) with GitHub Actions

1. Understanding Workflow Automation

Workflow automation in the context of DevOps refers to the automatic execution of a sequence of tasks, processes, or scripts triggered by specific events. In software development, this primarily involves Continuous Integration (CI) and Continuous Deployment (CD).

GitHub Actions is a powerful, native CI/CD and automation platform integrated directly into GitHub. It allows developers to automate their software development lifecycle directly from their repositories, eliminating the need for third-party CI/CD tools like Jenkins or CircleCI.

2. Core Components and Directory Structure

Workflow Directory Structure

GitHub Actions relies on YAML files to define workflows. These files must be stored in a specific directory at the root of your repository:
.github/workflows/

Any .yml or .yaml file placed inside this directory is automatically recognized by GitHub as a workflow definition.

Key Components

A GitHub Actions architecture is built upon five foundational components:

  1. Workflows: An automated procedure added to your repository. Workflows are defined by a YAML file and contain one or more jobs. They are triggered by events.
  2. Events: Specific activities in a repository that trigger a workflow run (e.g., code push, issue creation, pull request).
  3. Jobs: A set of steps in a workflow that execute on the same runner. By default, a workflow with multiple jobs will run those jobs in parallel, but they can be configured to run sequentially based on dependencies.
  4. Steps: Individual tasks that run commands in a job. A step can either run a shell script (run) or execute an action (uses). All steps in a job run on the same runner and share the same filesystem.
  5. Actions: Standalone, reusable commands that perform a complex but frequently repeated task (e.g., checking out code, setting up a Node.js environment).
  6. Runners: The server (virtual machine or container) that runs your workflows when they're triggered.

3. Events and Workflow Triggers

Workflows are executed based on triggers defined in the on block of the YAML file.

Common Triggers

  • Push: Triggers when code is pushed to a specified branch.
  • Pull Request: Triggers when a PR is opened, synchronized, or reopened.
  • Schedule: Triggers at scheduled times using POSIX cron syntax.
  • Manual Workflow: Uses the workflow_dispatch event to allow users to trigger the workflow manually from the GitHub UI or GitHub CLI. It can also accept manual input parameters.

Example of Triggers:

YAML
on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main
  schedule:
    - cron: '0 0 * * *' # Runs daily at midnight UTC
  workflow_dispatch:
    inputs:
      environment:
        description: 'Environment to deploy to'
        required: true
        default: 'staging'

4. Jobs, Matrix Strategies, and Steps

Jobs & Multi-job Workflows

A workflow can have multiple jobs. By default, they run in parallel. You can use the needs keyword to create dependencies, forming a pipeline where a job only starts if its prerequisite jobs succeed.

YAML
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - run: echo "Building..."
  
  test:
    needs: build # Will wait for 'build' to complete
    runs-on: ubuntu-latest
    steps:
      - run: echo "Testing..."

Matrix Strategies

A matrix strategy allows you to use variables in a single job definition to automatically create multiple job runs that are based on the combinations of the variables. This is highly useful for testing code across multiple language versions or operating systems.

YAML
jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [14.x, 16.x, 18.x]
        os: [ubuntu-latest, windows-latest]
    steps:
      - uses: actions/checkout@v3
      - name: Use Node.js ${{ matrix.node-version }}
        uses: actions/setup-node@v3
        with:
          node-version: ${{ matrix.node-version }}

Steps & Shell Commands

Steps execute inside the runner. The run keyword executes shell commands. You can specify the shell (bash, pwsh, python) or run multi-line scripts.

YAML
steps:
  - name: Install Dependencies
    run: |
      npm install
      npm run build

5. Actions, Language Setup, and Optimization

Using Marketplace Actions

The GitHub Marketplace hosts thousands of pre-built actions created by the community and verified publishers. You invoke them using the uses keyword.

  • actions/checkout@v3: Fetches your repository code into the runner.
  • actions/upload-artifact@v3: Saves files generated during the build.

Language-Specific Actions

To compile or test code, the runner needs the correct language environment. Standard actions exist for this:

  • Node.js: actions/setup-node
  • Python: actions/setup-python
  • Java: actions/setup-java
  • Go: actions/setup-go

Using Caching for Faster Builds

Dependencies (like node_modules or .m2 directories) can take a long time to download. The actions/cache action saves these directories between workflow runs, drastically reducing CI times.

YAML
steps:
  - uses: actions/checkout@v3
  - name: Cache node modules
    uses: actions/cache@v3
    with:
      path: ~/.npm
      key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
      restore-keys: |
        ${{ runner.os }}-node-

6. Runners: Execution Environments

GitHub-Hosted Runners

GitHub provides virtual machines managed and maintained by GitHub.

  • Pros: Zero maintenance, fresh isolated environment for every job, automatically updated.
  • Cons: Limited hardware resources, IP addresses change (harder to whitelist), potential queuing delays.
  • Usage: runs-on: ubuntu-latest, windows-latest, macos-latest.

Self-Hosted Runners

You can install the GitHub Actions runner application on your own machines (on-premises, AWS EC2, Raspberry Pi, etc.).

  • Pros: Highly customizable hardware, can be placed inside private networks (VPCs), persistent caches, no per-minute billing from GitHub.
  • Cons: You manage the OS, updates, and scaling.

Runner Security & Management

  • Ephemeral vs. Persistent: GitHub-hosted runners are ephemeral (destroyed after use). Self-hosted runners are usually persistent, meaning state can leak between runs.
  • Security Risk: Never use self-hosted runners on public repositories for pull requests without strict approval workflows. Malicious actors can submit PRs containing code that executes on your private infrastructure.
  • Network Security: Self-hosted runners communicate outbound to GitHub via HTTPS (port 443), eliminating the need to open inbound firewall ports.

7. Docker & GitHub Actions

GitHub Actions excels at building and publishing Docker containers.

Building Docker Images in CI

You can run standard Docker commands using the run step, or use official Docker actions like docker/build-push-action.

Pushing to Docker Hub

Requires setting up secrets (DOCKER_USERNAME, DOCKER_PASSWORD) in the GitHub repository settings.

YAML
steps:
  - name: Log in to Docker Hub
    uses: docker/login-action@v2
    with:
      username: ${{ secrets.DOCKER_USERNAME }}
      password: ${{ secrets.DOCKER_PASSWORD }}
      
  - name: Build and push Docker image
    uses: docker/build-push-action@v3
    with:
      context: .
      push: true
      tags: user/app:latest

Pushing to GitHub Container Registry (GHCR)

GHCR is GitHub's native container registry. Authentication is handled automatically using the built-in GITHUB_TOKEN.

YAML
steps:
  - name: Log in to GHCR
    uses: docker/login-action@v2
    with:
      registry: ghcr.io
      username: ${{ github.actor }}
      password: ${{ secrets.GITHUB_TOKEN }}
      
  - name: Build and push to GHCR
    uses: docker/build-push-action@v3
    with:
      context: .
      push: true
      tags: ghcr.io/${{ github.repository }}/app:latest

8. Continuous Deployment (CD): Deploying to Servers/Cloud

GitHub Actions extends beyond CI to handle CD, pushing artifacts or containers to live environments.

Deployments to Servers/Clouds

Deployments can be achieved in multiple ways depending on the target infrastructure:

  1. SSH Deployments: Using actions like appleboy/ssh-action to connect to a VPS and execute pull/restart commands.
  2. Cloud Providers (AWS, Azure, GCP): Providers offer official actions to authenticate and deploy.
    • AWS: aws-actions/configure-aws-credentials (Followed by commands like aws ecs update-service or aws s3 sync).
    • Azure: azure/login (Followed by Azure Web App deployment actions).
  3. Kubernetes: Using azure/k8s-set-context or aws-actions/amazon-eks-update-kubeconfig to connect to a cluster, followed by kubectl apply commands.

Best Practices for Deployment Workflows

  • Environments: Use GitHub Environments to require manual approval before a deployment job runs.
  • Secrets Management: Never hardcode credentials. Use GitHub Secrets to store SSH keys, cloud access keys, and passwords.
  • OIDC (OpenID Connect): Instead of storing long-lived cloud credentials in GitHub, use OIDC to allow GitHub Actions to request short-lived, temporary access tokens from cloud providers (AWS IAM, Azure AD, GCP IAM). This significantly enhances security.