Unit 3 - Notes

CSE275 17 min read

Unit 3: Swarm Intelligence Techniques

1. Swarm Intelligence (SI) Concepts

Swarm Intelligence is a subfield of Artificial Intelligence based on the collective behavior of decentralized, self-organized systems. The term was introduced by Gerardo Beni and Jing Wang in 1989, in the context of cellular robotic systems. SI systems are typically composed of a population of simple agents that interact locally with one another and with their environment. Although there is no centralized control structure dictating how individual agents should behave, local interactions between such agents often lead to the emergence of complex, intelligent global behavior.

Natural Inspirations

Ant Colonies: Ants communicate indirectly through pheromones to find the shortest path to food sources.
Bird Flocking / Fish Schooling: Individuals coordinate their movement without a leader, enabling complex maneuvers for navigation and predator avoidance.
Honey Bee Colonies: Bees perform a "waggle dance" to communicate the direction and distance of high-quality food sources to their hive mates.

Key Principles of Swarm Intelligence

Five fundamental principles enable the emergence of intelligent behavior from a swarm:

Proximity Principle: The swarm should be able to carry out simple space and time computations.
Quality Principle: The swarm should be able to respond to quality factors in the environment (e.g., food gradients).
Principle of Diverse Response: The swarm should not commit its activities to a narrow, uniform path. Randomness is a key element.
Principle of Stability: The swarm should not change its behavior every time the environment changes.
Principle of Adaptability: The swarm must be able to change its behavior when the environmental change is significant.

Core Characteristics

Decentralization: No single agent is in control. The global behavior is an emergent property of local interactions.
Self-Organization: Complex patterns and behaviors arise from simple rules followed by individual agents without external guidance.
Indirect Communication (Stigmergy): Agents often communicate indirectly by modifying their environment (e.g., laying pheromones), which influences the behavior of other agents later.
Scalability & Robustness: The system can easily scale by adding more agents. The failure of a few agents usually does not cause the entire system to fail.

2. Particle Swarm Optimization (PSO)

PSO is a population-based stochastic optimization technique developed by Dr. Eberhart and Dr. Kennedy in 1995, inspired by the social behavior of bird flocking or fish schooling.

Core Idea & Analogy

Imagine a flock of birds searching for a single piece of food in a large area. None of the birds know where the food is, but they know how far they are from it in each iteration. A simple yet effective strategy is for the birds to follow the one who is currently closest to the food.

PSO uses this concept by initializing a population of random solutions, called "particles," and then searching for optima by updating generations.

Components and Mathematical Formulation

Particle: A potential solution in the D-dimensional search space.
Position (x_i): The current coordinates of particle i in the search space.
Velocity (v_i): The rate of change of the position for particle i.
Personal Best (pbest_i): The best position (yielding the best fitness value) found so far by particle i.
Global Best (gbest): The best position found so far by any particle in the entire swarm.

The core of PSO lies in the velocity and position update equations for each particle i in each dimension d at time step t+1:

Velocity Update:
TEXT
```
    v_id(t+1) = w * v_id(t) + c1 * r1 * (pbest_id(t) - x_id(t)) + c2 * r2 * (gbest_d(t) - x_id(t))
    
```
- w: Inertia weight. Controls the influence of the particle's previous velocity. A larger w encourages global exploration, while a smaller w encourages local exploitation.
- c1, c2: Acceleration coefficients. c1 is the cognitive coefficient (attraction towards personal best), and c2 is the social coefficient (attraction towards global best).
- r1, r2: Random numbers uniformly distributed in [0, 1], introducing stochasticity.

Position Update:

TEXT

    x_id(t+1) = x_id(t) + v_id(t+1)

PSO Algorithm Pseudocode

TEXT

Initialize swarm of N particles with random positions (x) and velocities (v)
FOR each particle i:
    Evaluate fitness(x_i)
    pbest_i = x_i

gbest = position of the particle with the best initial fitness

WHILE termination condition is not met:
    FOR each particle i:
        // Update velocity
        FOR each dimension d:
            r1 = random(0,1), r2 = random(0,1)
            v_id = w * v_id + c1*r1*(pbest_id - x_id) + c2*r2*(gbest_d - x_id)
        
        // Update position
        x_i = x_i + v_i
        
        // Evaluate fitness
        current_fitness = fitness(x_i)
        
        // Update pbest
        IF current_fitness is better than fitness(pbest_i):
            pbest_i = x_i
            
        // Update gbest
        IF current_fitness is better than fitness(gbest):
            gbest = x_i
            
RETURN gbest

3. Ant Colony Optimization (ACO)

ACO is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. It was introduced by Marco Dorigo in his Ph.D. thesis (1992) and was initially applied to the Traveling Salesperson Problem (TSP).

Core Idea & Analogy

ACO is inspired by the foraging behavior of ants. Ants initially wander randomly in search of food. When an ant finds a food source, it returns to the colony, laying down a chemical trail of pheromones. Other ants, when encountering a pheromone trail, are likely to follow it. Shorter paths get reinforced more quickly because ants complete the round trip faster, leading to a higher concentration of pheromones. This positive feedback loop eventually leads the colony to converge on the shortest path.

Components and Mathematical Formulation

ACO is particularly well-suited for discrete optimization problems.

Pheromone Trail (τ_ij): A value associated with the edge (i, j) that represents its desirability. Higher values indicate a better path.
Heuristic Information (η_ij): A priori information about the edge (i, j). For TSP, this is typically the inverse of the distance (1/d_ij).
Transition Probability: The probability of an ant k at node i choosing to move to node j is given by:

ACO Transition Probability — AI-generated image — may contain inaccuracies

*   `N_i^k`: The set of unvisited neighboring nodes for ant `k` at node `i`.
*   `α` (alpha): Parameter controlling the influence of the pheromone trail.
*   `β` (beta): Parameter controlling the influence of the heuristic information.

Pheromone Update: Pheromones are updated after all ants have completed a tour. This consists of two phases:
1. Evaporation: All pheromone trails are reduced to simulate natural decay.
  TEXT
```
        τ_ij = (1 - ρ) * τ_ij
        
```
  - ρ (rho): The pheromone evaporation rate (0 < ρ < 1).
2. Deposition: Ants that found good solutions deposit pheromones on the edges they traversed.
  TEXT
```
        τ_ij = τ_ij + Δτ_ij
        
```
  - Δτ_ij: Amount of pheromone deposited, usually proportional to the quality of the solution found. For an ant k, Δτ_ij^k = Q / L_k if edge (i, j) is in its tour, where Q is a constant and L_k is the tour length.

ACO Algorithm Pseudocode (for TSP)

TEXT

Initialize pheromone trails (τ_ij) on all edges
WHILE termination condition is not met:
    FOR each ant k:
        // Construct a solution (a tour)
        Place ant k on a random starting city
        WHILE ant k has not visited all cities:
            Select next city j based on the transition probability formula
            Move ant k to city j
            Add city j to the ant's visited list
        Complete the tour by returning to the starting city
        Calculate the tour length (L_k)
    
    // Update pheromones globally
    FOR each edge (i, j):
        // Evaporation
        τ_ij = (1 - ρ) * τ_ij
        // Deposition
        FOR each ant k:
            IF ant k used edge (i, j):
                Δτ_ij^k = Q / L_k
                τ_ij = τ_ij + Δτ_ij^k

    Update the best-so-far solution

RETURN best-so-far solution

4. Artificial Bee Colony (ABC) and Cuckoo Search (CS)

Artificial Bee Colony (ABC)

ABC is an optimization algorithm based on the intelligent foraging behavior of a honey bee swarm, proposed by Dervis Karaboga in 2005.

Core Idea & Components

The swarm consists of three groups of bees:

Employed Bees: Associated with a specific food source (a solution). They exploit their current source and share its information with onlooker bees.
Onlooker Bees: Watch the "waggle dances" of the employed bees and choose a food source to exploit based on its quality (nectar amount/fitness). This introduces a probabilistic, quality-based selection mechanism.
Scout Bees: If a food source is abandoned after a certain number of trials (the limit parameter), its employed bee becomes a scout, searching randomly for a new food source. This is the main exploration mechanism.

Mathematical Formulation

A potential solution x_i is a food source. Its quality is given by fitness(x_i).

Employed Bee Phase: Each employed bee generates a new candidate solution v_i in the neighborhood of its current solution x_i:
TEXT
```
    v_ij = x_ij + φ_ij * (x_ij - x_kj)
    
```
- j is a randomly chosen dimension.
- k is a randomly chosen solution different from i.
- φ_ij is a random number in [-1, 1].
  A greedy selection is performed between v_i and x_i.
Onlooker Bee Phase: An onlooker bee chooses a food source i with a probability p_i based on its fitness:

ABC Probability — AI-generated image — may contain inaccuracies

*   `fit_i` is the fitness value of solution `i`.
*   `SN` is the number of employed bees (or food sources).
Once a source is chosen, the onlooker generates a new candidate solution in its neighborhood, just like an employed bee.

Scout Bee Phase: If a solution x_i cannot be improved for a predetermined number of cycles (limit), it is abandoned. The corresponding employed bee becomes a scout and generates a new random solution for the swarm.

Cuckoo Search (CS)

CS is an optimization algorithm developed by Xin-She Yang and Suash Deb in 2009, inspired by the brood parasitism of some cuckoo species. It is enhanced by the use of Lévy flights for exploration.

Core Idea & Components

Brood Parasitism: Each cuckoo lays one egg at a time and dumps it in a randomly chosen nest. The best nests with high-quality eggs will carry over to the next generation.
Host Nests: A host nest represents a solution in the search space.
Cuckoo Egg: A new potential solution.
Discovery Probability (p_a): A host bird can discover the alien cuckoo egg with probability p_a. If discovered, the egg is thrown away, and the host builds a new nest in a new location (a new solution is generated).
Lévy Flights: Cuckoos/birds find the next nest/food source using a random walk. Lévy flights are a special type of random walk where the step lengths follow a heavy-tailed probability distribution. This allows for occasional long-distance jumps, significantly enhancing the algorithm's global search capability.

Mathematical Formulation

A new solution x^(t+1) for a cuckoo i is generated using a Lévy flight:

TEXT

x_i^(t+1) = x_i^(t) + α ⊕ Lévy(λ)

α > 0: A step size scaling factor.
⊕: Entry-wise multiplication.
Lévy(λ): A random number drawn from a Lévy distribution. This generates the random walk step length. A simple way to generate this is: step = u / |v|^(1/β), where u and v are drawn from normal distributions and β is a parameter (typically 1.5).

After generating new solutions, a fraction p_a of the worst nests are abandoned, and new ones are built at new locations.

5. Exploration vs. Exploitation Trade-off

This is a fundamental dilemma in nearly all optimization algorithms.

Exploration: The process of visiting entirely new regions of the search space to find a promising new solution. It is about global search.
Exploitation: The process of visiting regions within the neighborhood of a previously visited good solution to see if a better solution can be found. It is about local search and refinement.

An effective algorithm must strike a balance between these two. Too much exploration can prevent convergence, while too much exploitation can lead to getting stuck in a local optimum (premature convergence).

Balance in Swarm Algorithms

PSO:
- Exploration: High inertia weight (w), random factors (r1, r2), cognitive component (c1).
- Exploitation: Low inertia weight (w), social component (c2) pulling particles towards the known gbest.
- Balancing: A common strategy is to use a high w initially to encourage exploration and gradually decrease it to shift focus to exploitation (Linearly Decreasing Inertia Weight).
ACO:
- Exploration: Pheromone evaporation (ρ) prevents trails from becoming too strong, allowing ants to explore other paths. The randomness in the probabilistic path selection (α, β parameters) also contributes.
- Exploitation: Pheromone deposition reinforces good paths, encouraging more ants to follow them.
ABC:
- Exploration: The scout bee mechanism is a pure exploration operator, introducing new random solutions into the population.
- Exploitation: Employed and onlooker bees search in the vicinity of known good solutions.
CS:
- Exploration: The long jumps characteristic of Lévy flights allow the algorithm to escape local optima and explore distant regions of the search space.
- Exploitation: The p_a parameter, which replaces a fraction of solutions, and the local random walks inherent in the Lévy flight formula provide local search capabilities.

6. Parameter Sensitivity and Tuning in Swarm Algorithms

The performance of swarm algorithms is highly sensitive to their control parameters. Proper tuning is crucial for achieving good results.

Algorithm	Key Parameters	Impact and Tuning Considerations
PSO	`w` (Inertia Weight) `c1`, `c2` (Accel. Coeffs) `Swarm Size`	`w` balances global/local search. `c1` and `c2` control the influence of personal vs. swarm knowledge. `c1`=`c2`≈2.0 is a common setting. Swarm size (e.g., 20-50) affects diversity and convergence speed.
ACO	`α`, `β` (Influence params) `ρ` (Evaporation Rate) `Q` (Pheromone deposit) `Num Ants`	`α` and `β` determine the relative importance of pheromones vs. heuristics. `ρ` controls convergence speed; high `ρ` increases exploration. `Q` scales the pheromone values.
ABC	`Colony Size` `Limit`	`Colony Size` (Employed + Onlooker bees) affects diversity. `Limit` determines when a food source is abandoned; a small `limit` encourages exploration, a large `limit` favors exploitation.
CS	`p_a` (Discovery probability) `α` (Step size) `Population Size`	`p_a` controls the rate at which worse solutions are replaced, influencing convergence and diversity. `α` scales the search steps.

Tuning Strategies

Empirical/Manual Tuning: Adjusting parameters based on experience and trial-and-error.
Grid Search/Random Search: Systematically or randomly searching a predefined range of parameter values.
Parameter Adaptation: Designing rules to change parameter values dynamically during the run (e.g., decreasing inertia weight in PSO).
Meta-optimization: Using another optimization algorithm to find the best parameter settings for the primary algorithm.

7. Swarm Stagnation and Mitigation Strategies

Stagnation (or premature convergence) occurs when the search process slows down or stops progressing towards the global optimum because the swarm's diversity has been lost. All agents converge to a single, often suboptimal, region of the search space.

Causes

Loss of Diversity: All particles/agents cluster in a small area.
Over-Exploitation: The algorithm focuses too heavily on a known good solution, ignoring other potentially better regions.
Poor Parameter Settings: An incorrect balance of exploration and exploitation can lead to rapid convergence.

Mitigation Strategies

Parameter Adaptation:
- PSO: Use a time-varying inertia weight (w) or acceleration coefficients (c1, c2).
Topology Modification (PSO):
- Instead of a single gbest (star topology), use a local best (lbest). Each particle is only influenced by its immediate neighbors in a defined topology (e.g., a ring). This slows down information propagation and helps maintain diversity.
Diversity Introduction:
- Re-initialization: If stagnation is detected (e.g., no improvement in gbest for several iterations), randomly re-initialize the positions and velocities of some or all particles.
- Mutation: Apply a small, random perturbation to particles' positions, similar to genetic algorithms.
Hybridization:
- Combine the swarm algorithm with a local search algorithm (e.g., Hill Climbing). The swarm algorithm performs global exploration, and the local searcher refines the promising solutions found.
Multi-Swarm Approaches:
- Use multiple swarms that explore the search space independently and periodically exchange information (best solutions). This is also known as the "island model."

8. Parallel and Distributed Nature of Swarm Optimization

Swarm intelligence algorithms are inherently parallel, making them well-suited for implementation on parallel and distributed computing architectures.

Inherent Parallelism

The most computationally expensive step in most SI algorithms is the fitness evaluation. In a single generation/iteration, the fitness of each agent (particle, ant, bee) can be calculated independently of all other agents. This is a classic example of an "embarrassingly parallel" problem.

If you have N agents and P processing cores, you can evaluate the fitness of P agents simultaneously, leading to a potential speedup of up to P times.

Parallelization Models

Master-Slave Model (Global Parallelism):
- A single master process manages the main algorithm loop and global information (e.g., gbest in PSO, pheromone table in ACO).
- The master distributes the task of fitness evaluation to multiple slave processes.
- Slaves evaluate the fitness of the agents assigned to them and return the results to the master.
- The master then updates the global state and starts the next iteration.
- Benefit: Simple to implement.
- Drawback: Can have a communication bottleneck at the master node.
Island Model (Multi-Swarm/Coarse-Grained Parallelism):
- The entire population is divided into several sub-populations (swarms or "islands").
- Each island runs the SI algorithm independently on a separate processor for a certain number of generations (an "epoch").
- After each epoch, a migration phase occurs where islands exchange some of their best individuals.
- Benefits: Reduces communication overhead compared to the master-slave model. The isolation of sub-populations naturally promotes diversity, potentially leading to better exploration of the search space and avoiding premature convergence.

9. Swarm-based Optimization for Machine Learning Problems

Swarm intelligence provides a powerful, gradient-free approach to solving complex optimization problems in machine learning.

A. Feature Selection

Goal: Select the most relevant subset of features to improve model accuracy, reduce overfitting, and decrease training time.
Mapping to SI:
- Representation: A solution (e.g., a particle's position) is represented as a binary vector of length D, where D is the total number of features. A 1 in the j-th dimension means feature j is selected; a 0 means it is not. Some variants use continuous values between [0,1] which are then thresholded.
- Fitness Function: The fitness of a feature subset is evaluated by training and testing an ML model (e.g., k-NN, SVM, Decision Tree) on that subset. The fitness is typically the classification accuracy from a cross-validation procedure. A penalty term is often added to favor smaller feature subsets:
  Fitness = α * (Accuracy) - (1 - α) * (Number of Selected Features / Total Features)

B. Hyperparameter Tuning

Goal: Find the optimal set of hyperparameters for a machine learning model (e.g., C and gamma for an SVM; learning rate, number of layers, neurons per layer for a neural network).
Mapping to SI:
- Representation: A solution's position is a vector where each dimension corresponds to a hyperparameter. The search space for each dimension is bounded by the valid range of the corresponding hyperparameter.
- Fitness Function: The fitness is the performance of the ML model evaluated with that specific set of hyperparameters, typically measured by cross-validation accuracy on a validation set. The goal is usually to maximize accuracy or minimize error.

C. Neural Network Training (Neuroevolution)

Goal: Optimize the weights and biases of a neural network, as an alternative to gradient-based methods like backpropagation.
Mapping to SI:
- Representation: A solution is a single long vector containing all the weights and biases of the network, unrolled. The dimensionality of the search space can be very large.
- Fitness Function: The fitness is typically the inverse of the loss function (e.g., Mean Squared Error or Cross-Entropy) evaluated on the training dataset.

D. Clustering

Goal: Partition a dataset into k clusters such that data points in the same cluster are similar. This involves finding the optimal cluster centroids.
Mapping to SI (e.g., PSO-based clustering):
- Representation: A particle's position represents the coordinates of all k cluster centroids. If the data has D dimensions, the particle's position vector will have a length of k * D.
- Fitness Function: The fitness is a measure of clustering quality. A common choice is to minimize the intra-cluster distance, such as the Sum of Squared Errors (SSE). The fitness function would then be 1 / SSE. Each particle's fitness is calculated by assigning all data points to their nearest centroid (as defined by the particle's position) and then computing the SSE.

Unit 2

Unit 4

Unit 3 - Notes

Table of Contents

Unit 3: Swarm Intelligence Techniques

1. Swarm Intelligence (SI) Concepts

Natural Inspirations

Key Principles of Swarm Intelligence

Core Characteristics

2. Particle Swarm Optimization (PSO)

Core Idea & Analogy

Components and Mathematical Formulation

PSO Algorithm Pseudocode

3. Ant Colony Optimization (ACO)

Core Idea & Analogy

Components and Mathematical Formulation

ACO Algorithm Pseudocode (for TSP)

4. Artificial Bee Colony (ABC) and Cuckoo Search (CS)

Artificial Bee Colony (ABC)

Core Idea & Components

Mathematical Formulation

Cuckoo Search (CS)

Core Idea & Components

Mathematical Formulation

5. Exploration vs. Exploitation Trade-off

Balance in Swarm Algorithms

6. Parameter Sensitivity and Tuning in Swarm Algorithms

Tuning Strategies

7. Swarm Stagnation and Mitigation Strategies

Causes

Mitigation Strategies

8. Parallel and Distributed Nature of Swarm Optimization

Inherent Parallelism

Parallelization Models

9. Swarm-based Optimization for Machine Learning Problems

A. Feature Selection

B. Hyperparameter Tuning

C. Neural Network Training (Neuroevolution)

D. Clustering