TLDR: Just like traditional software, if you are building automations that involve AI agents, you still need to involve your clients or end users early and often in feedback loops. In this blog, I talk about some early client feedback I received on an AI agent’s research output, before full automation, that uncovered oversights in location coverage, irrelevant self-published data, and missing key sources. This feedback was essential to refine the AI agent's performance and ensure the final automation delivered valuable results.
In The Lean Startup, Eric Ries told a story about Zappos founder Nick Swinmurn. To validate his idea for a centralized online shoe store, Swinmurn went to local shoe stores, took photos of the shoes, and posted them online. When someone bought a pair, he’d go back to the store, buy the shoes, and ship them.
So even with essentially nothing—no product, no inventory system, no website, Swinmurn was able to get user feedback. He was able to interact with real customers and learn about their needs with little more than a camera and a hypothesis.
Having worked for many years conducting early customer pilots for traditional software, I’ve seen first-hand the value of early user feedback. But what I like about Zappos’ early days story is that it reminds me that user feedback is possible at a very early stage with very little.
Today, I like to apply this same principle to building workflow automations with AI agents. It’s tempting to believe that because this work involves complex, layered agentic workflows in tools like n8n - often without a traditional user interface - that we can skip this step, but I believe this step is just as crucial.
Don’t just show the wizardry
Building these agentic workflows is exciting because it is still new and evolving. With technology moving so quickly these days, it’s easy to fall into a familiar trap: the "black box" build. I know as builders we go deep into the logic crafting the perfect prompts in n8n, tuning the model parameters, designing and engineering the workflow, and customizing nodes and integrations. (I still get a little thrill when my workflow actually works end-to-end!)
It’s tempting to proudly present the shiny finished results of the entire workflow - ta da!”- to the client or end user as a fait accompli e.g. "Here's your daily report”, "Here’s your automated emails”, “Here’s your generated articles” “Take a look at your extracted PDF”...and relish in the client’s gushes at your wizardry…and only then ask for feedback, leaving the nuts and bolts of the automation well and truly behind the scenes.

The problem with this is that the AI agent(s) decision-making that determines the whole workflow output can typically happen some steps back. And the quality, relevancy and accuracy of the output is critical to the rest of the workflows and the final results you show to the client.
Real Example: Client feedback on Web Research Automation
Recently I built an automation for a client that includes an AI agent, a “research assistant” if you will, that performs regular web research, and then the workflow routes various actions or kicks off sub-workflows off the back of that research. So the quality, relevancy and accuracy of the research is critical to the final results. Essentially, this research assistant was the "brain" of the workflow. It consumes information, processes it, and produces insights, and the rest of the workflow(s) depend on the quality of those insights.
(When designing workflows, I always ask the client to provide initial context—any information they believe will be useful for me or the AI agent. However, securing this upfront information is often challenging, as we all make assumptions about what others already know. Therefore, this initial context, while useful, cannot replace early client feedback during development.)
Client Feedback
When the client reviewed the raw output from the AI agent’s web research, below is a summary of some of the main issues identified:
Geographic Scope: Although the web research was initially scoped to specific geographic locations (cities and regions), the review revealed a deficiency in coverage. The scope subsequently required expansion to include additional towns, townlands, and larger villages. This omission could have persisted undetected in a production environment for an extended period.
Self-Referential Content: The AI agent output inadvertently included results that the client had published themselves. For this specific use case, it was determined that a filter needed to be implemented to exclude web content originating directly from the client's own platforms/socials. (This is a bit of a “rookie mistake” to watch out for! Sometimes it is valid to include these results, but more often than not, you want to remove)
Missing Key Sources: Critical domain-specific sources were missing from the results. These sources, which were not initially provided by the client, were only identified and highlighted during the review of the raw output, underscoring the value of client domain expertise in the process.
So showing these early, raw research results proved a valuable exercise for this scenario. This early user feedback loop is a lesson we know well from classic software development, whether it’s early mockups, clickable prototypes, and MVPs to your users. The same principle applies, perhaps even more critically, to agentic automations. In many cases, your AI agent is a prototype.
We are basically asking: Before we fine-tune what happens next in our automation, let's look at what the AI agent is doing. Is this output, and the overall workflow, going to provide business value? Is it meeting client expectations? Is the client going to pay for this?

Pulling out nuance from user feedback
During one of the feedback sessions, the client made a comment about one entry in the Google sheet: that it needed to be disregarded because, even though technically it came under their remit, it would be seen as encroaching on a neighboring organization’s area or “turf” and they respect these invisible boundaries.
The was a good reminder for me of the limitations of AI agents. And why these feedback loops are important.
The agent has no world view, nor has the underlying model. It doesn’t understand the nuances of this client and their business, their day-to-day experience, or the internal dynamics of an organization. It doesn’t know the particular ecosystem that is made up of interactions, politics, history, gossip, lore, communities, tribes, behaviors, personalities, and so on.
Getting feedback on a "headless" automation
A common objection I hear is: "But this is just a workflow with an AI step. There's no UI. How do I 'demo' it, or get feedback? How would a non-technical client understand this if I showed it to them?
The feedback process doesn’t need to be pretty, just functional! You don't need a polished front-end or a fancy database to get meaningful feedback. For the automation that I talked about above, I just shared the research output from the AI agent in Google Sheets, and the client reviewed it over a few iterations.
For clients who are more comfortable with automation tools, a co-creation session (pair building) might work: share your screen and run a few examples live with the client. Watch the agent work and talk through its reasoning (if it has chain-of-thought). This collaborative debugging can be effective for uncovering hidden requirements and gaps.

This is separate to AI evaluation
One important distinction that I want to make: this early user feedback cycle review happens for us during the development and prompt-tuning phase, before we conduct any formal AI evaluations. We’re using the client's judgment and domain knowledge to help us define what "good" actually looks like. Their feedback on accuracy, relevancy, and value forms part of the foundational dataset.
This pre-automation review is a collaborative quality gate. It's where the client's subject-matter expertise becomes an important prompt-tuning tool, checking:
Accuracy: Is the information factually correct? Did the agent misunderstand technical jargon?
Relevancy: Is it capturing what actually matters to the business?
Completeness: Are there glaring omissions? Is the depth of analysis sufficient? Is there domain knowledge missing?
The more your client or end user is exposed to the workflows, the more potential they can start to see. These feedback conversations can tap into other requirements, or other automation possibilities. They don’t need to understand it all, but seeing how things work, and what the possibilities are, can often spark ideas.
The more involved you can get your users with the nuts and bolts of the process, the more they will understand themselves where the issues generally arise in terms of AI agent behavior, and what to watch out for during their early reviews. This, in turn, will improve the value of your future feedback loops. Also they’ll hopefully provide better upfront context information to you when you do your next automation.
In conclusion, again, don't wait for the final ta-da moment!
Ultimately, the principle remains the same, whether you're validating a shoestore MVP or an AI agent-powered agentic workflow: think of the AI agents as the prototype for your use case. By shifting the user feedback loop forward—from reviewing a final, polished deliverable to reviewing the raw, step-by-step decision-making, you transform a potential "black box" into a collaborative quality gate.
Moreover, these early, raw feedback sessions can potentially unlock hidden benefits: the client's understanding of AI agent behavior deepens, leading them to provide better upfront context for future automations.
Most importantly, as they engage with the potential of the process, they start to see new requirements and automation possibilities you wouldn't have uncovered otherwise. This foundational work defines "good" before a single formal AI evaluation is run, ensuring your automation is built or true relevance, accuracy, and lasting business value. Don't wait for the final ta-da!—start iterating with users on the relevancy of your automation today.
Note: Comic illustrations in this blog were generated by Google Gemini
1 In this blog I use the term AI agent to describe a workflow-driven intelligence that uses a language model—large or small—together with structured prompts, contextual memory, and tool integrations to autonomously execute tasks. For example, in tools like n8n, an AI agent combines a system prompt, user input, stored context, and reasoning steps with the ability to call external tools (such as HTTP requests, databases, or custom functions) to interpret information, make decisions, and perform actions across different systems.
