Unit 5 - Notes

INT347 7 min read

Unit 5: Introduction to Intelligent Agents

1. Agent Fundamentals

An Intelligent Agent is a software entity situated within an environment that perceives its surroundings through sensors and acts upon that environment through actuators to achieve specific objectives. In the context of modern Software Bots and Artificial Intelligence, an agent leverages a Large Language Model (LLM) or a machine learning algorithm as its core reasoning engine to make decisions, execute tasks, and interact with external systems.

Core Components of an Agent:

Environment: The domain in which the agent operates (e.g., a web browser, a database, a customer service portal).
Sensors (Inputs): Mechanisms used to gather data from the environment (e.g., API payloads, user text inputs, web scraping).
Actuators (Outputs): Mechanisms used to manipulate the environment (e.g., sending an email, writing to a database, generating a text response).
Brain (Reasoning Engine): The logic center (usually an LLM) that processes inputs, formulates plans, and determines actions.

2. Agent Types

Agents are generally categorized based on their level of independence and their relationship with human users.

Assistive Agents

Definition: Agents designed to operate alongside humans, providing support, recommendations, and streamlining tasks. They require a "human-in-the-loop."
Function: They wait for human prompts, suggest completions, or summarize data. The human makes the final decision.
Examples: GitHub Copilot, conversational chatbots, virtual assistants (Siri, Alexa).

Augmented (Autonomous) Agents

Definition: Agents designed to operate independently to achieve a complex, high-level goal. They extend human capabilities by handling end-to-end task execution.
Function: Given a broad objective, they break it down into sub-tasks, execute them sequentially, self-correct if they encounter errors, and present the final result.
Examples: AutoGPT, autonomous trading bots, automated QA testing agents.

3. Agent Characteristics

To be classified as "intelligent," an agent must exhibit three primary characteristics:

Autonomy: The ability to operate without direct human intervention. The agent controls its internal state and actions, making independent decisions based on its programming and the current context.
Reactivity: The ability to perceive changes in the environment and respond in a timely fashion. For example, if an API endpoint goes down, a reactive agent will detect the failure and attempt an alternative route.
Goal-Oriented Behaviour (Proactiveness): The agent does not simply react to stimuli; it takes initiative to achieve its programmed objectives. It can formulate plans, anticipate future states, and take steps toward maximizing a specific reward or goal.

4. AI Agent Node Architecture

In modern AI development frameworks (like LangChain or LlamaIndex), an agent is often structured as a "Node" within a larger graph or system. The architecture of a single AI Agent Node typically consists of:

System Prompt/Persona: The foundational instructions dictating the agent's role, constraints, and behavioral guidelines.
LLM Core (The Brain): The neural network responsible for natural language understanding and reasoning.
Memory Module: Storage for past interactions and context (both short-term and long-term).
Tool Registry: A list of executable functions (APIs, calculators, search engines) the agent is permitted to use.
Parser/Output Formatter: A mechanism to translate the LLM's raw text output into structured commands (e.g., JSON) that external systems can execute.

5. Perception-Action Loops

The Perception-Action Loop (often compared to the OODA loop: Observe, Orient, Decide, Act) is the continuous cycle through which an agent interacts with its environment.

Perceive: The agent receives a user prompt or an environmental trigger.
Think/Plan: The agent's reasoning engine processes the input, accesses memory, and determines the best course of action.
Act: The agent executes a tool or generates a response.
Observe (Feedback): The agent observes the result of its action (e.g., reading the output of an API call). If the goal is not met, the loop repeats.

6. Function Calling & Tool Integration Fundamentals

To interact with the outside world, agents must use tools. Function Calling is a feature built into modern LLMs that allows them to output a structured JSON object intended to call a specific function, rather than generating conversational text.

Tool Integration Workflow:

Tool Definition: The developer provides the agent with a JSON schema describing available tools (e.g., get_current_weather(location, unit)).
Inference: The user asks, "What's the weather in Paris?"
Function Call Generation: The LLM recognizes it needs external data and outputs: {"name": "get_current_weather", "arguments": {"location": "Paris", "unit": "celsius"}}.
Execution: The underlying software intercepts this JSON, executes the actual Python/Node.js function, and gets the result (e.g., {"temperature": 22, "condition": "Sunny"}).
Synthesis: The result is fed back to the LLM, which then formulates a natural language response for the user.

7. Memory Systems

An agent without memory is stateless, treating every interaction as its first. Memory systems give agents context and continuity.

Short-Term Context (Working Memory)

Concept: The immediate context provided within the LLM's context window (e.g., 8k, 32k, or 128k tokens).
Usage: Used for maintaining the state of the current task, tracking the steps of a current plan, and holding temporary variables.

Conversation History (Rolling Buffer)

Concept: A chronological log of the back-and-forth interactions between the user and the agent.
Usage: Allows the agent to handle follow-up questions (e.g., User: "Who is the CEO of Apple?" Agent: "Tim Cook." User: "How old is he?").
Implementation: Because context windows are finite, conversation history is often managed using strategies like Sliding Windows (keeping only the last $N$ messages) or Summarization (compressing older messages into a short summary).

8. Task-Specific Agents

General-purpose agents can struggle with complex workflows. Consequently, developers design task-specific agents optimized for specific domains.

Research Agents

Goal: To gather, synthesize, and cite information on a given topic.
Tools: Web search APIs (Tavily, Google Custom Search), web scrapers (BeautifulSoup), PDF parsers.
Workflow: Receives a topic $\rightarrow$ Generates search queries $\rightarrow$ Reads top articles $\rightarrow$ Extracts relevant facts $\rightarrow$ Compiles a well-structured report with citations.

Customer Support Agents

Goal: To resolve user queries, manage account issues, and route complex problems to human operators.
Tools: CRM databases (Salesforce, Zendesk), knowledge base search (Vector Databases/RAG), ticketing APIs.
Workflow: Receives user complaint $\rightarrow$ Retrieves user profile from CRM $\rightarrow$ Searches knowledge base for policy $\rightarrow$ Resolves issue or escalates to a human agent.

9. Agent vs. Workflow Comparison

It is crucial to distinguish between traditional software workflows and intelligent agents.

Feature	Traditional Workflow	Intelligent Agent
Execution Path	Deterministic (Rule-based, IF/THEN logic).	Non-deterministic (Dynamic, inferred by LLM).
Adaptability	Rigid. Fails if unexpected edge cases occur.	Flexible. Can reason through anomalies and adapt.
Routing	Static routing defined by developers (e.g., Directed Acyclic Graphs).	Semantic routing decided by the agent based on context.
Best Use Case	Predictable, repetitive, well-structured tasks.	Unpredictable, complex tasks requiring cognitive reasoning.

10. Agent Workflow Design Patterns

To reliably orchestrate agents, developers use established design patterns:

ReAct (Reason + Act): The agent is prompted to alternate between reasoning about what to do next and taking action.
- Pattern: Thought -> Action -> Observation -> Thought...
Plan-and-Solve (Plan-and-Execute): A two-node pattern. The "Planner" agent breaks a complex prompt into a step-by-step checklist. The "Executor" agent completes the steps one by one.
Router Pattern: A gateway agent classifies the incoming prompt and routes it to specialized sub-agents (e.g., routing a billing question to a Billing Agent, and a tech question to an IT Agent).
Evaluator-Optimizer (Reflection): One agent generates an output, and a second "Critic" agent reviews it against a set of criteria. If it fails, the first agent tries again.

11. Agent Error Management

Because agents operate non-deterministically, they are prone to errors (e.g., hallucinations, malformed tool calls, API timeouts). Robust error management is critical.

Self-Correction: If a tool returns an error (e.g., 404 Not Found or JSON Parse Error), the error message is fed back directly into the agent's prompt, asking it to fix its mistake and try again.
Retry Logic & Circuit Breakers: Implementing exponential backoff for failed API calls, and "circuit breakers" that stop the agent after a maximum number of steps to prevent infinite, costly loops.
Graceful Degradation: If an agent cannot complete a task autonomously, it should be programmed to gracefully halt and escalate the issue to a human, rather than guessing or hallucinating an answer.
Constrained Output (Guardrails): Using libraries (like Guidance or Outlines) that force the LLM to output only valid JSON or adhere strictly to predefined schemas, eliminating syntax errors in function calling.

Unit 4

Unit 6