Unit 1 - Notes

CSC203

Unit 1: Introduction to Blockchain

1. Need for Distributed Record Keeping

Distributed record keeping (Distributed Ledger Technology or DLT) emerged as a solution to the limitations inherent in centralized databases. In a traditional centralized model, a single entity (like a bank or government) controls the ledger, creating distinct vulnerabilities.

Limitations of Centralized Systems

  • Single Point of Failure: If the central server crashes or is destroyed, the entire system halts.
  • Trust Dependence: Users must trust the central authority not to manipulate records, censor transactions, or act maliciously.
  • Security Vulnerabilities: A hacker only needs to breach one central database to compromise the entire network's data.
  • Lack of Transparency: Processes occur within a "black box," invisible to end-users.

Advantages of Distributed Record Keeping

  • Immutability: Once recorded, data is extremely difficult to alter retrospectively.
  • Transparency: All participants (nodes) hold a copy of the ledger, allowing for public verification.
  • Censorship Resistance: No single authority can block valid transactions.
  • Fault Tolerance: The system operates even if multiple nodes fail.

A comparison diagram showing three network topologies side-by-side. The left panel shows "Centralize...
AI-generated image — may contain inaccuracies


2. Modeling Faults and Adversaries

To build a robust distributed system, one must understand how components can fail and how attackers behave.

Types of Faults

  1. Fail-Stop (Crash) Faults:

    • A node simply stops working (e.g., power failure, hardware crash).
    • The node does not send incorrect data; it just becomes silent.
    • Detection: Relatively easy; the node stops responding to "heartbeat" signals.
  2. Byzantine Faults:

    • A node functions but behaves maliciously or arbitrarily.
    • The node may send conflicting information to different parts of the network (e.g., telling Node A "True" and Node B "False").
    • This is the worst-case scenario for distributed systems.

Adversary Models

  • Computational Power:
    • Bounded Adversary: Has limited computing power (cannot break standard cryptography like SHA-256 or ECDSA).
    • Unbounded Adversary: Has infinite power (theoretical; breaks encryption). Blockchain assumes adversaries are computationally bounded.
  • Network Control:
    • Adversaries may control message delivery (delaying or dropping packets) to create partition attacks (split-brain scenarios).

3. The Byzantine Generals Problem

This is a classic logical dilemma used to describe the challenge of achieving consensus in a distributed environment where trust is lacking.

The Allegory

  • Scenario: Several generals of the Byzantine army are camped around an enemy city. They must agree on a common battle plan: Attack or Retreat.
  • Constraint: The attack will only succeed if they act in unison. If some attack and others retreat, they will be slaughtered.
  • Communication: Generals rely on messengers to communicate.
  • The Problem: Some generals (or the Commander) may be traitors. Traitors attempt to confuse the loyal generals by sending conflicting orders (telling General A to attack and General B to retreat) to prevent consensus.

The Mathematical Threshold

To guarantee consensus in a system with nodes, where nodes are Byzantine (traitors), the system requires:

This means strictly less than 1/3 of the network can be malicious for the network to reach a reliable consensus using traditional BFT (Byzantine Fault Tolerance) algorithms.

A schematic diagram illustrating the Byzantine Generals Problem. In the center, place a node labeled...
AI-generated image — may contain inaccuracies


4. Consensus Algorithms and Scalability Problems

Consensus is the mechanism by which nodes in a distributed network agree on the state of the ledger.

Common Algorithms

  1. Proof of Work (PoW):

    • Used by Bitcoin.
    • Nodes (miners) solve complex mathematical puzzles. The first to solve it gets to propose the next block.
    • Security: High. Requires 51% of total computing power to attack.
    • Downside: High energy consumption.
  2. Proof of Stake (PoS):

    • Used by Ethereum (post-Merge).
    • Validators lock up (stake) coins. The probability of being chosen to validate is proportional to the stake size.
    • Downside: "Rich get richer" centralization risks.
  3. PBFT (Practical Byzantine Fault Tolerance):

    • Used in permissioned blockchains (e.g., Hyperledger Fabric).
    • Relies on voting rounds/message passing rather than mining.

The Scalability Trilemma

Blockchains struggle to achieve three properties simultaneously. Usually, only two are possible:

  1. Decentralization: No central control.
  2. Security: Immutability and defense against attacks.
  3. Scalability: High transaction throughput (TPS) and low latency.

Scalability Problems

  • Throughput Bottlenecks: Bitcoin processes ~7 TPS; Visa processes ~24,000 TPS.
  • Block Size limits: Increasing block size improves throughput but increases the storage burden on nodes, leading to centralization.
  • Network Latency: Propagation of blocks across a global network takes time, increasing the risk of "stale blocks" (orphans).

5. Technologies Borrowed in Blockchain

Blockchain is not a single invention but a novel combination of existing technologies from the 1980s and 1990s.

A. Hash Pointers and Merkle Trees

  • Hash Pointer: A data structure that points to the location of data and contains the cryptographic hash of that data. If the data changes, the hash changes, invalidating the pointer.
  • Merkle Tree: A binary tree where leaf nodes are hashes of data blocks (transactions), and non-leaf nodes are hashes of their children. The Merkle Root allows efficient verification of whether a specific transaction exists in a block without downloading the whole block.

B. Digital Cash (E-Cash)

  • David Chaum (1983): Introduced blind signatures for anonymous digital payments.
  • The Double Spending Problem: The fundamental challenge of digital cash. Unlike physical cash, digital tokens are files that can be copied (spent twice).
  • Pre-Bitcoin Solution: Relied on a central mint to check serial numbers.
  • Bitcoin Solution: Relied on the public ledger (blockchain) to verify a coin hasn't been spent yet.

C. Public Key Cryptography (Asymmetric)

  • Uses a pair of keys: Private Key (for signing transactions) and Public Key (derived from the private key, serves as the address/identity).
  • Ensures that only the owner of the funds can initiate a transaction.

6. Nakamoto’s Concept: Blockchain-based Cryptocurrency

In 2008, Satoshi Nakamoto published the Bitcoin whitepaper, proposing a peer-to-peer electronic cash system.

Core Innovation: The Chain of Blocks

Nakamoto proposed linking blocks using hash pointers. Each block contains:

  1. Data: A list of transactions.
  2. Metadata: Timestamp, Nonce (number used once).
  3. Previous Hash: The hash of the entire previous block header.

This creates a dependency chain. To change an entry in Block 5, an attacker must recompute the hash for Block 5, which changes the input for Block 6, requiring a rehash of Block 6, and so on, all the way to the current block.

A detailed block diagram showing the structure of a Blockchain. Display three sequential blocks: Blo...
AI-generated image — may contain inaccuracies

Solving Double Spending via Consensus

Nakamoto replaced the central bank with the Longest Chain Rule (or Heaviest Chain Rule):

  1. Transactions are broadcast to all nodes.
  2. Nodes collect transactions into a block.
  3. Nodes work on finding a difficult proof-of-work (mining) for their block.
  4. When a node finds a proof, it broadcasts the block.
  5. Nodes accept the block only if all transactions in it are valid and not already spent.
  6. Nodes express acceptance by working on the next block in the chain, using the hash of the accepted block as the previous hash.

If two blocks are found simultaneously (a fork), nodes work on the first one they receive, but save the other branch. The tie is broken when the next proof-of-work is found on one of the branches, making it the longest chain. The network converges on this longest chain as the source of truth.