Patching agent shortcomings is the future of knowledge work

As organizations become dependent on agentic systems, knowledge work becomes fixing things that agents get wrong

Knowledge work, as a thing, exists to increase the chances that good decisions are made. This can be easy to miss because it sure looks like knowledge workers exist to produce documents: Specs, presentations, product roadmaps, emails (that inevitably get filtered), etc. But those artifacts are only means, not the point.

We are moving toward a world where agents make most decisions. Perhaps we’ll never hit 100% — given the ceiling of moral and legal liability — but we’ll get close. (One might imagine that the new “uptime” is the percent of decisions agents make in your organization. Have you hit three nines yet?) This has already largely happened in software development. Software source code programs are not themselves products; they are formally documented decisions about what computers will do in the future (once the code is deployed). From this perspective, coding agents are currently making the majority of decisions in software development. This will happen with the rest of knowledge work as well.

Today, language model-powered technologies (chatbots, etc.) are used to help humans make decisions. However, bots searching through human documents to inform human users is not terribly efficient. Organizations who flip this around will have the competitive edge. Knowledge workers’ purpose will be to support their company’s agents in making good decisions on behalf of the company.

Of course, agents won’t always make good decisions, because… reality is hard. Agents will have shortcomings, partly due to the core agentic system itself (e.g. foundation model reasoning quality, harness reliability), but increasingly due to the quality of integration with a company’s own tooling, systems, data, and knowledge. Knowledge workers will, as a result, spend their day investigating and addressing the shortcomings of their company’s agents.

This work will need to be prioritized and tracked. That need will lead to a new class of software, which we might perhaps call Agentic Improvement Management Systems. This is conceptually related to modern project management or work item tracking software, but has its own dynamics. The inputs to the system are “misses”, which represent instances where an agent didn’t achieve its intended outcome; these are akin to incidents in modern software operations. As agentic operations scale, the list of misses will grow overwhelmingly long, and as such misses will be grouped into larger collections, which we’ll call “shortcomings”. These are analogous to modern software bugs in some sense, but bug has the wrong connotation: Whereas a bug means “we intended for the system to be able to do a specific thing, and it fails to do that specific thing”, a shortcoming means “we intended for the system to be able to do more or less anything, and it just turns out it doesn’t do this specific thing yet”. Shortcomings can then be prioritized based on frequency, severity, and the like.

Shortcomings can also be assigned to humans — indeed, that’s why they’re tracked. Humans become responsible for taking the manual follow-up actions necessary to address the shortcomings. Agent shortcomings might fall into a handful of predictable classes:

Missing integration: The agent correctly determined what it needed to do, but it had not been given the ability to take that action. This may mean engineering work to make more systems available for agentic use, or creating an escalation path for an agent to request just-in-time approvals to sensitive systems.
Incorrect interpretation: The agent accessed the correct data, but misunderstood its meaning, for example mistaking a duration field measured in seconds as if it were milliseconds. In these cases, minor manual data annotations or similar clarifications may address the issue.
Undocumented knowledge: The agent was unable to achieve the correct outcome because critical company knowledge wasn’t written down. (Agents might be the motivation organizations need to implement formal knowledge management programs again.) In these cases, it may be necessary to interview subject matter experts, reminiscent of expert systems development.
Inadequate performance: The agent eventually got to the correct outcome, but it took too long — for example, a human completed the task first. (In this world, a human completing something before an agent is considered a failure.) This could also reflect other nonfunctional dimensions, such as untenably high inference costs. Improving agent performance might involve redesigning internal tool APIs or reorganizing entire knowledge bases.

These classes of shortcomings imply (at least part of) knowledge work’s future shape:

Documenting and organizing company knowledge
Clarifying and disambiguating company data
Configuring and operating tool integrations
Identifying and removing bottlenecks in all of the above

…all in support of the company’s agents. (Some of these things may benefit humans as well, especially improving documentation; but when there’s a tradeoff between optimizing something for agents versus humans, the former takes precedence.)

Marking shortcomings as resolved will (ideally) involve running verification processes, something akin to automated tests or evals, to ensure both that the identified shortcoming is resolved, and that other important regressions haven’t been introduced. This implies having sandboxed versions of effectively the company’s entire data and tool estate available for testing agents against, potentially alongside digital twins of external systems (e.g. social media networks). This is nontrivial new infrastructure.

In some cases, it might not be entirely clear what the “correct” behavior for an agent should’ve been. If it makes a factual error, that might be straightforward enough, but in more subjective territories (fairness comes to mind), someone may need to disambiguate. This is where a company’s upper management comes in. The executive suite has two main functions in the agentic era:

Clarifying preferred decisions for the company’s agents to make, which then become codified in the above infrastructure.
Making final decisions on matters where, for moral or legal reasons, human approval is still considered required. (Speaking personally, I hope this set of matters never becomes empty.)

Meetings will still exist, but they’ll serve a different purpose. The ideal meeting will involve not just discussing changes to address agent shortcomings, but indeed making those changes directly. Even in the cases of more substantial improvements that can’t be completed in an hour, committing the improvement plans directly into an agentically available knowledge store enables agents to potentially make some progress of their own.

The day-to-day practicalities of knowledge work will be quite different, but the basic nature won’t: Exerting cognitive effort to increase the quality of an organization’s decisions. There are some plausible dystopian futures of work instigated by artificial intelligence, but (to me at least) this one actually sounds pretty good. I’d rather end my day knowing that I committed knowledge updates used by my organization’s agents than that I sent emails I don’t really expect anybody to read. So, whenever you’re ready, go ahead and assign that first shortcoming to me.