Unit 4 - Notes

INT327 11 min read

Unit 4: Cloud Storage

1. Azure Storage Accounts

An Azure Storage Account is a unique namespace in Azure for your data. It provides a container for a set of Azure Storage services, including Azure Blobs, Files, Queues, and Tables. Every object you store in Azure Storage has an address that includes your unique account name.

1.1 Core Concepts

  • Namespace: The base URL for objects in a storage account is https://<account_name>.<service>.core.windows.net.
  • Durability and High Availability: Data is replicated and highly available, with options for redundancy across zones or regions.
  • Security: All data is encrypted by default (at rest and in transit). Access is controlled through robust mechanisms like Azure AD, RBAC, and Shared Access Signatures (SAS).
  • Scalability: Designed to be massively scalable to meet the data storage and performance needs of modern applications.

1.2 Types of Storage Accounts

Type Description Supported Services Performance Tiers Replication Options
Standard General-purpose v2 Recommended for most scenarios. A unified account for all storage services. Provides access to the latest features, including access tiers (Hot, Cool, Archive). Blobs, Files, Queues, Tables, Disks Standard LRS, GRS, RA-GRS, ZRS, GZRS, RA-GZRS
Premium Block Blobs Optimized for high transaction rates and low, consistent storage latency. Data is stored on solid-state drives (SSDs). Block Blobs, Append Blobs Premium LRS, ZRS
Premium File Shares Designed for enterprise-grade, high-performance file shares. Files Premium LRS, ZRS
Premium Page Blobs For high-performance page blob scenarios only (e.g., VHD files for Azure VMs). Page Blobs Premium LRS

1.3 Performance Tiers

  • Standard: Uses magnetic drives (HDDs). Lower cost per GB, suitable for bulk storage, archives, and applications with less sensitivity to latency.
  • Premium: Uses solid-state drives (SSDs). Offers high performance, low latency, and consistent I/O operations. Ideal for I/O-intensive workloads like databases, big data analytics, and high-performance computing.

1.4 Access Tiers (for Blob Storage)

Access tiers are an account-level setting that lets you store blob data in the most cost-effective manner based on usage patterns.

  • Hot Tier: Optimized for frequently accessed data. Highest storage cost, lowest access cost. Default tier.
  • Cool Tier: Optimized for infrequently accessed data that is stored for at least 30 days. Lower storage cost, higher access cost compared to Hot.
  • Cold Tier: Optimized for rarely accessed data stored for at least 90 days. Even lower storage costs and higher access costs than Cool.
  • Archive Tier: Optimized for rarely accessed data with flexible latency requirements (retrieval can take hours). Lowest storage cost, highest data retrieval cost. Data is stored offline and must be "rehydrated" before access.

2. Azure Blob Storage

Azure Blob (Binary Large Object) Storage is Microsoft's object storage solution for the cloud. It is optimized for storing massive amounts of unstructured data, such as text, images, videos, backups, and logs.

2.1 Blob Storage Hierarchy

  1. Storage Account: The top-level container.
  2. Container: A logical grouping of blobs, similar to a folder in a file system. All blobs must reside in a container.
  3. Blob: The actual file or object being stored.

2.2 Types of Blobs

  • Block Blobs:

    • Composed of individual blocks of data that can be managed separately.
    • Optimized for uploading large amounts of data efficiently (parallel uploads).
    • Ideal for storing files like images, documents, videos, and backups.
    • Maximum size is approximately 190.7 TiB (400,000 blocks of 5000 MiB).
  • Append Blobs:

    • Also made up of blocks, but optimized for append operations.
    • When you modify an append blob, you can only add new blocks to the end. Updating or deleting existing blocks is not supported.
    • Ideal for logging scenarios, such as application logs or sensor data.
  • Page Blobs:

    • A collection of 512-byte pages optimized for random read and write operations.
    • Serve as the underlying storage for Azure IaaS VM disks (VHD files).
    • Maximum size is 8 TiB.

3. Azure Storage Security

Azure provides a multi-layered security model to protect data in storage accounts.

3.1 Identity and Access Management (IAM)

  • Azure Active Directory (Azure AD) Integration: Authenticate and authorize requests using Azure AD identities (users, groups, service principals, managed identities). This is the recommended approach.
  • Role-Based Access Control (RBAC): Assign granular permissions to identities at different scopes (Storage Account, Container).
    • Data Plane Roles: Storage Blob Data Owner, Storage Blob Data Contributor, Storage Blob Data Reader.
    • Management Plane Roles: Storage Account Contributor, Reader.
  • Storage Account Access Keys: A pair of 512-bit keys (key1, key2) that grant full administrative access to the entire storage account.
    • Extremely powerful: Treat them like a root password.
    • Best Practice: Avoid using access keys directly in applications. Use them for initial setup or key rotation, and prefer Azure AD or SAS for application access.
  • Shared Access Signatures (SAS): Provides delegated, time-limited, and permission-scoped access to storage resources. (Covered in detail next).

3.2 Network Security

  • Firewalls and Virtual Networks: Restrict access to the storage account's public endpoint to specific IP addresses, IP ranges, or virtual network subnets.
  • VNet Service Endpoints: Provides a secure and direct connection from a VNet to the Azure storage service over the Azure backbone network. Traffic remains within the Azure network.
  • Private Endpoints: Provisions a private IP address from your VNet for your storage account, effectively bringing the service into your VNet. All traffic to the storage account can be routed through the private endpoint, allowing you to disable the public endpoint entirely.

3.3 Data Protection

  • Encryption in Transit:
    • Data is protected between your application and Azure using Transport Layer Security (TLS).
    • Azure Storage enforces HTTPS by default. The "Secure transfer required" setting on the storage account rejects insecure HTTP requests.
  • Encryption at Rest:
    • All data written to Azure Storage is automatically encrypted before being stored and decrypted before being retrieved. This is known as Storage Service Encryption (SSE).
    • Microsoft-Managed Keys (MMK): By default, Microsoft manages the encryption keys. This is transparent to the user.
    • Customer-Managed Keys (CMK): You can use your own encryption keys stored in Azure Key Vault for an additional layer of control. You are responsible for managing the key's lifecycle (creation, rotation, disabling).
    • Customer-Provided Keys (CPK): You can provide an encryption key on individual requests. The key is not stored in Azure and must be managed by you.

4. Control Access with Shared Access Signatures (SAS)

A Shared Access Signature (SAS) is a URI that grants delegated access to Azure Storage resources. The token contains the specific permissions, start and expiry times, and a cryptographic signature.

4.1 How SAS Works

A client that does not have account credentials can present a SAS token to Azure Storage. The service checks the signature, permissions, and validity period. If valid, the request is authorized. This allows you to grant access without ever sharing your account access keys.

4.2 Types of SAS

  1. User Delegation SAS:

    • Secured with Azure AD credentials instead of an account key.
    • Considered the most secure option.
    • Applies only to Blob storage.
    • The permissions granted are the intersection of the permissions of the Azure AD principal and the permissions specified in the SAS token.
  2. Service SAS:

    • Secured with the storage account access key.
    • Delegates access to a resource in just one of the storage services: Blob, Queue, Table, or File service.
    • Example: Granting read access to a single blob for 1 hour.
  3. Account SAS:

    • Secured with the storage account access key.
    • Delegates access to resources in one or more of the storage services.
    • You can delegate access to service-level operations (e.g., Get/Set Service Properties) that are not available with a Service SAS.

4.3 Structure of a SAS Token (URI)

A full SAS URI consists of the resource URI followed by the SAS token (query parameters).

TEXT
https://<account>.blob.core.windows.net/<container>/<blob>?<sas_token>

The sas_token itself is a series of key-value pairs.

TEXT
?sv=2021-06-08&ss=b&srt=sco&sp=rwdlac&se=2023-10-27T18:05:01Z&st=2023-10-27T10:05:01Z&spr=https&sig=aBCdE...XyZ%3D

  • sv: Signed Version (storage service version)
  • ss: Signed Services (b=blob, f=file, q=queue, t=table)
  • srt: Signed Resource Types (s=service, c=container, o=object)
  • sp: Signed Permissions (r=read, w=write, d=delete, l=list, a=add, c=create, u=update)
  • st: Signed Start time (UTC)
  • se: Signed Expiry time (UTC)
  • spr: Signed Protocol (https,http)
  • sig: Signature (HMAC signature computed from the other parameters and the account key)

4.4 SAS Best Practices

  • Always use HTTPS.
  • Use a User Delegation SAS whenever possible.
  • Grant least privilege: Only provide the necessary permissions (r for read, not rwdl).
  • Set a short expiration time: Grant access for the minimum time required.
  • Use a Stored Access Policy: Define SAS constraints on the server-side (in a container). This allows you to modify or revoke a SAS without regenerating tokens.

5. Azure Storage Explorer

Azure Storage Explorer is a free, standalone application from Microsoft that allows you to easily manage your Azure Storage resources from Windows, macOS, and Linux. It provides a graphical user interface (GUI) to perform common storage management tasks.

5.1 Key Features

  • Connect to Multiple Subscriptions and Accounts: Manage resources across different Azure accounts and subscriptions in one window.
  • Blob Management:
    • Create, delete, and view containers.
    • Upload, download, copy, and delete blobs.
    • View and edit blob properties and metadata.
    • Generate SAS tokens for containers and blobs.
  • File Share Management: Manage Azure File shares, directories, and files.
  • Queue and Table Management: Add and process queue messages; view, query, and edit table entities.
  • Ease of Use: Provides a familiar file explorer-like interface for browsing storage.
  • Offline Support: Can connect to and manage local Azurite storage emulators for development.

6. Data Redundancy Options

Azure Storage always stores multiple copies of your data to protect it from planned and unplanned events. The redundancy option is chosen when you create the storage account.

6.1 Redundancy in a Single Region

  • Locally-Redundant Storage (LRS):

    • Copies: 3 synchronous copies of your data.
    • Scope: Within a single physical data center (a single "stamp" or "fault domain").
    • Protection: Protects against server rack and drive failures.
    • Use Case: Least expensive option. Suitable for non-critical data that can be easily reconstructed. Does not protect against a data center-level disaster.
  • Zone-Redundant Storage (ZRS):

    • Copies: 3 synchronous copies of your data.
    • Scope: Across three distinct Availability Zones within the primary region. Each Availability Zone is a physically separate location with independent power, cooling, and networking.
    • Protection: Protects against data center-level failures (zone outages).
    • Use Case: Recommended for high-availability applications and critical data where downtime is not acceptable.

6.2 Redundancy in Multiple Regions (Geo-Redundancy)

These options replicate data to a secondary, paired region hundreds of miles away from the primary region.

  • Geo-Redundant Storage (GRS):

    • Copies: 6 total copies. 3 synchronous copies in the primary region (LRS) and 3 asynchronous copies in the secondary region.
    • Scope: Two separate regions.
    • Protection: Protects against regional-level disasters.
    • Failover: Data is available in the secondary region only if Microsoft initiates a failover. You cannot read from the secondary region by default.
  • Read-Access Geo-Redundant Storage (RA-GRS):

    • Copies & Scope: Identical to GRS (6 copies across two regions).
    • Key Difference: Provides read-only access to the data in the secondary region, even if the primary region is fully operational.
    • Use Case: Critical for applications that require high availability for reads. If the primary region becomes unavailable, you can direct read traffic to the secondary endpoint.
  • Geo-Zone-Redundant Storage (GZRS):

    • Copies & Scope: Combines ZRS in the primary region with LRS in the secondary region. Data is copied synchronously across 3 Availability Zones in the primary region, and then copied asynchronously to a single data center in the secondary region.
    • Protection: Provides tolerance against both zonal failures and regional disasters. The best of both worlds.
  • Read-Access Geo-Zone-Redundant Storage (RA-GZRS):

    • Copies & Scope: Identical to GZRS.
    • Key Difference: Provides read-only access to the data in the secondary region.
    • Use Case: For applications requiring maximum consistency, durability, availability, and read performance.

7. Backup Vaults (Recovery Services Vault)

While you can manually copy data to a storage account for backup, Azure provides a managed backup service called Azure Backup. This service uses a Recovery Services vault as its core management entity.

7.1 What is a Recovery Services Vault?

  • A Recovery Services vault is an Azure Resource Manager object that stores backup data and recovery points created over time.
  • It provides a centralized interface to manage and orchestrate backup and restore operations for various Azure services, including:
    • Azure Virtual Machines
    • SQL Server in Azure VMs
    • SAP HANA in Azure VMs
    • Azure File shares
    • On-premises workloads via MARS agent or MABS/DPM.

7.2 How Vaults Relate to Storage

  • The vault itself is a management layer. The actual backup data is stored in the Azure Storage infrastructure, managed by Microsoft.
  • When you create a vault, you choose its storage redundancy setting (LRS or GRS). This determines how the backup data within that vault is replicated. By default, vaults use GRS.
  • You do not directly interact with the underlying storage account used by the vault; the Azure Backup service handles all data movement and storage management.

7.3 Key Benefits of Using a Vault

  • Centralized Management: A single place to monitor jobs, configure policies, and manage recovery points.
  • Policy-Based Automation: Define backup schedules (daily, weekly, etc.) and retention policies (how long to keep backups).
  • Application-Consistent Backups: Ensures that applications like SQL Server are backed up in a consistent state.
  • Enhanced Security: Vaults support features like soft delete (retaining deleted backup data for 14 days) and Multi-User Authorization (MUA) to protect against accidental or malicious deletion.
  • Simplified Restore: Provides granular restore options, such as restoring a full VM, individual files, or application databases.