AlgoCoder · AI & LLM Engineering · Case AI-11

AI Agent Architecture for a Workflow Automation Use Case That Multi-Step LLM Calls Couldn't Handle

The workflow needed reasoning across multiple tools and sources. Chained prompts weren't getting there.

Pilot purgatory (extended into agent architecture)

Abstract

A team building an internal automation tool intended to handle multi-step workflows that involved retrieving information, calling external systems, and producing structured outputs based on the retrieved data. The team had built initial versions using chained LLM calls; the chains worked for simple cases and failed for any case requiring genuine reasoning across the workflow.

I.Problem Statement

Chained prompts couldn't recover from intermediate-step errors, couldn't dynamically adjust the workflow based on what intermediate steps returned, and couldn't combine information from multiple sources into a coherent output without producing brittle behavior. The team had concluded that the chain-of-prompts pattern was structurally insufficient and that an agent architecture — where the model actively decided what to do next based on what it had observed — was the appropriate replacement.

II.Methodology

An agent architecture for the workflow automation use case.

The agent was given a defined tool set — functions it could call to retrieve information, query external systems, and produce structured outputs. Each tool was specified with clear inputs, outputs, and error behavior. The tool set was bounded; the agent operated within a defined surface rather than against arbitrary action space.

The agent's reasoning loop was structured around plan, act, observe, refine. The agent decomposed the user's request into a plan, executed steps from the plan, observed the results, and adjusted subsequent steps based on what it learned. The pattern handled cases where intermediate steps returned unexpected results without the brittleness chained prompts had produced.

Step-level validation was instrumented. Each tool invocation's inputs and outputs were validated against expectations; invalid invocations produced structured errors rather than propagating bad data into subsequent reasoning. The agent could recognize and recover from its own mistakes.

A maximum step count and budget were enforced. The agent couldn't loop indefinitely or consume unbounded resources on a single request; if a request exceeded the budget, the agent surfaced what it had accomplished and what remained, rather than running out of budget silently.

The agent's reasoning was instrumented in detail. Every step it took, every tool it called, every observation it made was logged. The team could review agent behavior post-hoc to understand why a request was handled the way it was and identify patterns where the agent's reasoning needed improvement.

A human-in-the-loop pattern was added for high-stakes outputs. The agent could complete its reasoning and present the result to a human reviewer for approval before the result was finalized. The pattern absorbed the cases where autonomous execution wasn't yet warranted while preserving the agent's value for the cases where it was.

III.Results & Discussion

The workflow automation tool handled the multi-step cases the chained-prompt approach hadn't. Workflow completion rates on the harder cases improved substantially. The team gained an extensible architecture — adding new tools to the agent's tool set extended the workflows it could handle without rebuilding the agent itself. The human-in-the-loop pattern provided a graduated trust surface that allowed the team to expand autonomous handling as confidence grew.

— —

AI-11 · Case 11 of 12 in AI& LLM Engineering

End of Transmission

Building something with shape similar to this?

Book an AI Strategy Call →