Unit 5 - Notes

INT363

Unit 5: Cloud-Native Development

1. Cloud-Native Architecture

Definition

Cloud-native architecture is an approach to designing, constructing, and operating workloads that are built in the cloud and take full advantage of the cloud computing model. According to the Cloud Native Computing Foundation (CNCF), cloud-native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds.

Core Pillars

  1. Microservices: Decomposing applications into small, independent services.
  2. Containers: Packaging code and dependencies together (e.g., Docker) for consistent execution.
  3. Service Meshes: Managing service-to-service communication, security, and observability (e.g., Istio, Linkerd).
  4. Immutable Infrastructure: Servers are never modified after deployment; they are replaced with new versions.
  5. Declarative APIs: Describing the desired state of the system (e.g., Kubernetes YAML files) rather than the steps to achieve it.

2. Loosely Coupled Services

Concept

Loose coupling is the cornerstone of cloud-native systems. It implies that individual components (microservices) have minimal knowledge of and dependency on other components.

Characteristics

  • Independent Deployability: A team can update, deploy, and scale Service A without coordinating with the team managing Service B.
  • Technology Agnostic: Service A can be written in Python while Service B is written in Java, provided they communicate via standard protocols.
  • Failure Isolation: If one service fails, it should not bring down the entire system (Cascading Failure prevention).

Mechanisms for Coupling

  • API Contracts: Services communicate via well-defined interfaces (REST, GraphQL, gRPC). Changing the internal logic implies no change to consumers as long as the API contract remains valid.
  • Asynchronous Messaging: Using message brokers (RabbitMQ, Kafka) allows services to communicate via events without waiting for a response, decoupling them in time.

3. Core Infrastructure Components

A. Service Discovery

In a microservices environment, instances come and go dynamically. Hardcoding IP addresses is impossible. Service Discovery is the mechanism used to locate services.

  1. The Service Registry: A database containing the network locations of available service instances (e.g., Netflix Eureka, Consul, Etcd).
  2. Client-Side Discovery:
    • The client queries the Service Registry to get the address of the service.
    • The client handles the load balancing logic.
    • Example: Netflix Ribbon.
  3. Server-Side Discovery:
    • The client makes a request to a Load Balancer/Router.
    • The Load Balancer queries the Service Registry and forwards the traffic.
    • Example: AWS Elastic Load Balancer (ELB), Kubernetes Service.

B. Load Balancing

Load balancing distributes incoming network traffic across multiple backend servers to ensure no single server bears too much load.

  • Layer 4 (Transport Layer): Balances based on IP and Port (TCP/UDP). Faster but less context-aware.
  • Layer 7 (Application Layer): Balances based on content (HTTP headers, cookies, URL path). Allows for intelligent routing (e.g., routing mobile users to specific services).
  • Algorithms: Round Robin, Least Connections, IP Hash.

C. Autoscaling

The ability of the system to automatically adjust resources based on demand.

  • Horizontal Scaling (Scale Out): Adding more instances of a service (containers/VMs). Ideally suited for stateless microservices.
  • Vertical Scaling (Scale Up): Adding more CPU/RAM to an existing instance. Limited by hardware maximums.
  • Predictive vs. Reactive:
    • Reactive: Scaling based on current metrics (e.g., CPU > 70%).
    • Predictive: Scaling based on historical traffic patterns using ML.

4. Data Management in Cloud-Native

Data management is the most complex aspect of distributed systems due to the CAP Theorem (Consistency, Availability, Partition Tolerance).

Key Patterns

  1. Database per Service:

    • Each microservice owns its own data and database.
    • Other services cannot access this database directly; they must use the service's API.
    • Benefit: Ensures loose coupling.
    • Drawback: Complexity in joining data across services.
  2. Polyglot Persistence:

    • Using different data storage technologies for different needs within the same application.
    • Example: MongoDB for product catalog (document), PostgreSQL for billing (relational), Redis for session caching (key-value).
  3. Saga Pattern (Distributed Transactions):

    • Since ACID transactions usually cannot span multiple microservices, Sagas are used.
    • A Saga is a sequence of local transactions. If one fails, the system executes compensating transactions to undo the changes made by the preceding steps.
  4. CQRS (Command Query Responsibility Segregation):

    • Splitting the model for updating information (Command) from the model for reading information (Query).
    • Allows scaling read and write workloads independently.

5. The Twelve-Factor App Methodology

Developed by Heroku, this is a set of best practices for building software-as-a-service (SaaS) apps that are portable and resilient.

  1. Codebase: One codebase tracked in revision control (Git), many deploys (dev, staging, prod).
  2. Dependencies: Explicitly declare and isolate dependencies (e.g., package.json, requirements.txt). Never rely on system-wide packages.
  3. Config: Store configuration in the environment, not in the code. (Credentials, DB handles vary per deploy).
  4. Backing Services: Treat backing services (databases, queues, caches) as attached resources via URLs.
  5. Build, Release, Run: Strictly separate the build (compiling code), release (combining build + config), and run (executing) stages.
  6. Processes: Execute the app as one or more stateless processes. Any persistent data must be stored in a stateful backing service (DB).
  7. Port Binding: Export services via port binding. The app is self-contained and does not rely on runtime injection of a web server (e.g., app listens on port 5000).
  8. Concurrency: Scale out via the process model. Run multiple processes/workers rather than threads inside one large process.
  9. Disposability: Maximize robustness with fast startup and graceful shutdown.
  10. Dev/Prod Parity: Keep development, staging, and production as similar as possible to reduce "it works on my machine" issues.
  11. Logs: Treat logs as event streams. The app shouldn't worry about storage/rotation; it simply writes to stdout.
  12. Admin Processes: Run admin/management tasks (e.g., DB migrations) as one-off processes in the same environment as the app.

6. Serverless Architectures

Definition

Serverless computing allows developers to build and run applications without managing servers. The cloud provider handles the infrastructure, and allocation is dynamic.

Key Models

  1. FaaS (Function-as-a-Service):
    • Event-driven execution of individual functions.
    • Ephemeral (stateless) and short-lived.
    • Examples: AWS Lambda, Azure Functions, Google Cloud Functions.
  2. BaaS (Backend-as-a-Service):
    • Third-party services that handle backend logic.
    • Examples: Auth0 (Auth), Firebase (DB/Push Notifications).

Advantages

  • Cost Efficiency: Pay only for compute time used (milliseconds), not for idle servers.
  • Zero Administration: No OS patching, scaling configuration, or server maintenance.
  • Auto-scaling: Scales automatically from zero to thousands of concurrent requests.

Disadvantages

  • Cold Starts: Latency when a function is invoked after being idle (provider must spin up the container).
  • Vendor Lock-in: Highly coupled to specific provider APIs.
  • Debugging Difficulty: Harder to reproduce the environment locally.

7. Case Studies

A. Netflix (The Microservices Pioneer)

  • Transition: Moved from a monolithic Java application to a massive microservices architecture (700+ services) on AWS.
  • Innovation - Chaos Engineering: Netflix created Chaos Monkey, a tool that randomly terminates instances in production to ensure the system can survive failures.
  • Netflix OSS Stack:
    • Eureka: Service Discovery.
    • Hystrix: Circuit Breaker pattern (prevents cascading failures).
    • Zuul: API Gateway.
    • Ribbon: Client-side load balancing.

B. Amazon (The "API Mandate")

  • The Problem: In the early 2000s, Amazon was a monolith struggling with deployment delays.
  • The Solution (The Bezos Mandate - 2002):
    1. All teams must expose their data/functionality through service interfaces (APIs).
    2. No other form of inter-process communication allowed (no direct database links).
    3. All interfaces must be designed to be externalizable (able to be sold to the public).
  • Result: This architecture enabled the creation of AWS (Amazon Web Services). By decoupling internal services, they could sell infrastructure components (S3, EC2) to the world.

C. Uber (Scaling Complexity)

  • Evolution:
    1. Monolith: Single repo.
    2. Microservices: Split into thousands of services (4,000+).
    3. Problem: While it solved scaling, it created a "microservice sprawl" where finding the root cause of errors became difficult, and dependency graphs became too complex.
  • Current State - DOMA (Domain-Oriented Microservice Architecture):
    • Uber aggregated microservices into "Domains."
    • A Domain acts as a collection of related microservices with a single entry point (Gateway).
    • This reduces the complexity of the connection graph while maintaining the benefits of microservices.