Unit 5 - Notes

INT364

Unit 5: Resiliency, Monitoring, and Automation

1. Monitoring AWS Resources with CloudWatch

Amazon CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers. It provides data and actionable insights to monitor applications, respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health.

Key Components of CloudWatch

A. CloudWatch Metrics

Metrics are the fundamental concept in CloudWatch. A metric represents a time-ordered set of data points that are published to CloudWatch.

  • Namespaces: Containers for metrics (e.g., AWS/EC2).
  • Dimensions: Name/value pairs that uniquely identify a metric (e.g., InstanceId).
  • Resolution:
    • Standard Monitoring: Data is provided every 5 minutes (free).
    • Detailed Monitoring: Data is provided every 1 minute (paid).
  • Statistics: Aggregations over a specific period (Average, Sum, Minimum, Maximum, Sample Count).

B. CloudWatch Alarms

Alarms allow you to watch a single metric over a specified time period and perform one or more actions based on the value of the metric relative to a threshold.

  • States:
    • OK: Within the threshold.
    • ALARM: Outside the threshold.
    • INSUFFICIENT_DATA: Not enough data to determine state.
  • Actions: Send a notification to Amazon SNS, trigger an Auto Scaling action, or perform an EC2 action (stop, terminate, reboot).

C. CloudWatch Logs

Enables you to monitor, store, and access log files from EC2 instances, AWS CloudTrail, Route 53, and other sources.

  • Log Groups: Collections of log streams sharing the same retention, monitoring, and access control settings.
  • Log Streams: Sequences of log events from a specific source (e.g., a specific application instance).
  • Metric Filters: Can extract metric observations from ingested events and transform them into data points in a CloudWatch metric.

D. CloudWatch Dashboards

Customizable home pages in the CloudWatch console that can monitor resources in a single view, even those spread across different regions.


2. Scaling Compute Resources Using Auto Scaling

AWS Auto Scaling monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost. While it can scale various resources (DynamoDB, Aurora), EC2 Auto Scaling is the primary focus for compute resilience.

Core Components of EC2 Auto Scaling

A. Groups

Your EC2 instances are organized into Auto Scaling Groups (ASGs) so they can be treated as a logical unit for scaling and management.

  • Min Size: The minimum number of instances the group must maintain.
  • Max Size: The maximum cap to prevent runaway costs.
  • Desired Capacity: The target number of instances the group attempts to maintain.

B. Configuration Templates

Defines what to launch.

  • Launch Template (Recommended): Supports versioning, spot vs. on-demand mixing, and T2/T3 unlimited bursting. It specifies the AMI ID, Instance Type, Key Pair, Security Groups, and User Data.
  • Launch Configuration (Legacy): Similar to templates but does not support versioning; once created, it cannot be modified (must be recreated).

C. Scaling Options

  1. Manual Scaling: Manually changing the Desired Capacity.
  2. Dynamic Scaling:
    • Target Tracking Scaling: "Keep CPU utilization at 50%." AWS automates the alarm and adjustment logic.
    • Step Scaling: "If CPU > 50%, add 1 instance; if CPU > 80%, add 3 instances."
    • Simple Scaling: "If CPU > 50%, add 1 instance." (Relies on cooldown periods).
  3. Scheduled Scaling: Scaling based on predictable load patterns (e.g., scale up every Monday at 8 AM).
  4. Predictive Scaling: Uses machine learning to analyze historical traffic patterns and forecast future capacity needs.

D. Self-Healing

If an instance in an ASG fails health checks (EC2 status checks or ELB health checks), Auto Scaling automatically terminates the unhealthy instance and launches a replacement to maintain the Desired Capacity.


3. Using Load Balancers for High Availability

Elastic Load Balancing (ELB) automatically distributes incoming application traffic across multiple targets, such as EC2 instances, containers, and IP addresses, in multiple Availability Zones (AZs).

Types of Load Balancers

A. Application Load Balancer (ALB) - Layer 7

  • Protocol: HTTP, HTTPS, gRPC.
  • Use Case: Microservices, container-based applications.
  • Routing Features: Path-based routing (/images), Host-based routing (sub.domain.com), Query string routing, Header routing.
  • Technology: Operates at the Request level.

B. Network Load Balancer (NLB) - Layer 4

  • Protocol: TCP, UDP, TLS.
  • Use Case: Extreme performance, low latency, static IP requirements.
  • Performance: Capable of handling millions of requests per second.
  • Technology: Operates at the Connection level.

C. Gateway Load Balancer (GWLB) - Layer 3 (Gateway + LB)

  • Use Case: Deploying third-party virtual appliances (firewalls, intrusion detection systems).
  • Function: Transparently passes traffic to appliances and acts as a gateway.

D. Classic Load Balancer (CLB) - Legacy

  • Old generation, supports both Layer 4 and 7 but lacks advanced routing features.

Key Concepts for High Availability (HA)

  1. Health Checks: The ELB periodically pings targets to ensure they are healthy. Traffic is only routed to healthy targets.
  2. Cross-Zone Load Balancing: When enabled, the load balancer distributes traffic evenly across all registered instances in all enabled Availability Zones, preventing one AZ from being overloaded while another is idle.
  3. Integration with Auto Scaling: ELB acts as the front door; ASG acts as the backend. If ASG scales out (adds instances), it automatically registers them with the ELB. If ELB detects an instance is unhealthy, it notifies the ASG to replace it.

4. DNS-Based Routing with Amazon Route 53

Amazon Route 53 is a highly available and scalable cloud Domain Name System (DNS) web service. It translates human-readable domain names (www.example.com) into IP addresses (192.0.2.1).

Key Concepts

  • Hosted Zone: A container for records, which include information about how you want to route traffic for a specific domain.
  • Records: Define where to route traffic.
    • A Record: Maps a hostname to an IPv4 address.
    • CNAME: Maps a hostname to another hostname (cannot be used for the root domain/Zone Apex).
    • Alias Record: AWS specific extension to DNS. Maps a hostname to an AWS resource (ELB, CloudFront, S3 Bucket). Can be used at the Zone Apex.

Routing Policies

  1. Simple Routing: Use for a single resource that performs a given function for your domain. No intelligence/health checks.
  2. Weighted Routing: Split traffic across multiple resources based on assigned weights (e.g., 80% to V1, 20% to V2). Useful for A/B testing or blue/green deployments.
  3. Latency Routing: Route traffic to the AWS Region that provides the lowest latency (fastest response time) for the user.
  4. Failover Routing: Used for Active-Passive failover configurations. If the primary resource fails the health check, Route 53 directs traffic to the secondary resource (DR site).
  5. Geolocation Routing: Route traffic based on the geographic location of the users (e.g., route all queries from Europe to a server in Frankfurt).
  6. Geoproximity Routing: Route traffic based on the physical distance between users and resources (requires Route 53 Traffic Flow).
  7. Multivalue Answer Routing: Returns multiple IP addresses (up to 8) and checks health. Similar to Simple routing but with health checks (client-side load balancing).

5. Infrastructure Automation with AWS CloudFormation

AWS CloudFormation is an Infrastructure as Code (IaC) service that allows you to model, provision, and manage AWS and third-party resources by treating infrastructure as code.

Benefits

  • Consistency: Removes manual error.
  • Version Control: Templates can be stored in Git.
  • Cost Estimation: Estimate costs before deployment.
  • Rollback: Automatic rollback if stack creation fails.

CloudFormation Template Anatomy (JSON or YAML)

  1. AWSTemplateFormatVersion: The version of the template format.
  2. Description: A text string describing the template.
  3. Parameters: Values to pass to your template at runtime (e.g., InstanceType, KeyPairName).
  4. Mappings: Key-value pairs used to specify conditional values (e.g., Mapping AMI IDs to specific Regions).
  5. Resources (Required): The AWS resources to provision (e.g., AWS::EC2::Instance). This is the only mandatory section.
  6. Outputs: Values returned after the stack is created (e.g., the public IP of the created instance or the JDBC string of an RDS database).

Key Operations

  • Stack: A collection of resources as a single unit. You create, update, and delete a collection of resources by creating, updating, and deleting stacks.
  • Change Sets: A preview of how proposed changes to a stack might impact your running resources (similar to a "Terraform Plan").
  • Drift Detection: Checks if the actual configuration of a stack's resources differs from the configuration specified in the CloudFormation template (e.g., someone manually changed a Security Group rule).
  • Intrinsic Functions: Built-in functions to assign values dynamically.
    • Ref: References a parameter or resource ID.
    • Fn::GetAtt: Gets a specific attribute of a resource (e.g., PublicDNS).
    • Fn::ImportValue: Imports values exported by another stack.

6. Automating Deployments with AWS Quick Starts and Amazon Q Developer

AWS Quick Starts (Partner Solutions)

AWS Quick Starts (now largely integrated into AWS Partner Solutions and AWS Solutions Library) are automated reference deployments built by AWS solutions architects and AWS Partners.

  • Purpose: To help you deploy popular technologies on AWS according to best practices.
  • Mechanism: They utilize CloudFormation templates to automate the build.
  • Components:
    • Architecture Guide: A PDF describing the architecture, costs, and design decisions.
    • CloudFormation Templates: Automate the provisioning.
    • Scripts: Helper scripts for configuration (often used in User Data).
  • Use Cases: Quickly deploying a PCI-DSS compliant environment, a Data Lake, SAP HANA, or a WordPress cluster with HA defaults enabled.

Amazon Q Developer (formerly CodeWhisperer components)

Amazon Q Developer is a generative AI-powered assistant designed to support software development and infrastructure automation.

  • Infrastructure as Code (IaC) Generation: Amazon Q allows developers to describe the infrastructure they need in natural language, and it generates the corresponding AWS CloudFormation or AWS Cloud Development Kit (CDK) code.
    • Example: "Create a CloudFormation template for an Auto Scaling Group behind an Application Load Balancer."
  • Code Explanation and Upgrades: It can analyze existing infrastructure code to explain what it does or suggest upgrades (e.g., upgrading Java versions or moving from older EC2 generations).
  • Troubleshooting: Integrated into the AWS Console, it can diagnose errors (e.g., "Why did my Lambda function fail?" or "Why can't my EC2 instance connect to S3?") and suggest network reachability or IAM permission fixes.
  • IDE Integration: Works within VS Code or IntelliJ to autocomplete infrastructure code and security scans in real-time.