Return engineering

The shape of software engineering is shifting fast. This post is my hypothesis for the shape we are headed toward. Perhaps it will be an ephemeral blip on the way to something else, or maybe it will be a new steady state. Either way, I think the challenges we are facing now, and the ones we will be facing soon, share common traits worth characterizing. I am focused here first and foremost on software engineers, but I suspect many of the same dynamics will apply to all professionals, as well as consumers, over time.

The essential question is: What exactly should software engineers produce? Historically, the answer was code. It is not code anymore. So what is it?

The driving force behind current change is the emergence of agentic coding systems, and a key challenge today is getting these agents to continue working until their entire assigned task is complete. This situation will not last long, as we will continue tuning models and improving harnesses accordingly. Instead, not too far from now, the problem will be getting agents to stop. We will need to develop strict circumstances under which systems return control to the human user. The new essential primitive is the return-of-control condition.

There are multiple reasons we might want agentic systems to return control:

Completion. The first and most obvious is that the system has met its goal; in other words, it is done. While today’s agents will sometimes insist the task is finished while the code does not even compile, soon agents may fall into the same trap real engineering teams do: scope creep (“I am absolutely right! I should add that feature.”).
Ambiguity. Agents exist to pursue specific goals, and when a goal is critically underspecified — especially when the wrong interpretation can have severe consequences — the correct behavior is for the agent to elicit clarification from the human user.
Security. The hot new thing for development is to run agents in YOLO mode, with full permission to do everything on the host computer. This approach might persist for personal side projects, but it will not scale to enterprise contexts where compliance and profits are at stake.
Cost. Long-running agents can eat through tokens, electricity, hardware capacity, and money at an alarming rate. We are currently in a technologist’s honeymoon period with these new tools, but eventually business realities will hit and we will need to enforce spending limits.
Infeasibility. As agentic systems become more willing to relentlessly pursue goals, they might even become willing to pursue things that are simply infeasible with current and emerging technologies — or even things that are logically impossible. (That said, perhaps they would have a breakthrough and prove us wrong?)

Viewed together, I think some themes emerge, the most important of which is testability. Return-of-control conditions may become the new system under test, rather than the software program itself. We want to know, with whatever degree of confidence warranted for a particular project, that agentic systems will in fact stop when the programmed criteria are met. Testing is never free, but spending an hour or two verifying criteria in advance may be worth it if you expect the agent to run for days or weeks unattended. This is not to say that implemented conditions will necessarily look like traditional unit tests or linter rules; they could also use LLM-as-judge approaches or other techniques that have yet to be invented. What is important is that the conditions are trusted and not manual.

Many return criteria are likely also distributable. In other words, vendors might provide, in one form or another, criteria that can be configured for your particular task and easily consumed by any agent. From a vendor’s perspective, this can be thought of as a customer success program, just with an agent serving as a proxy. Products could even have dedicated onboarding flows, including verified return conditions, specifically for agents.

The idea of returning can sound absolute and complete, however return of control can also be partial. A complete return means an agentic system cannot proceed whatsoever without human decisions; it is paused. A partial return means the system can continue on some tasks or workstreams, but will be unable to complete its specified goal without human input. This distinction is primarily of interest as a UX design challenge: How do we notify users that input is required, and how do we prioritize the most important pieces of input as the quantity of agentic activity rises?

In the other direction, this perspective also raises the possibility of returnless, or perpetual, agents. Are there conceivable agents whose goals simply have no terminus? In the coding realm, perhaps an agent tasked with keeping package dependencies up to date (should those continue to exist). In the broader sphere, perhaps an agent tasked with protecting the Global Seed Vault (or achieving global peace?). This brings us into fundamental AI safety territory — the question of whether AI can be uncontrolled and yet still aligned. This is far enough outside my expertise that I will not speculate here.

Some folks lament that agentic development systems are already leading to a loss of software craft. There is truth to this, and I sympathize with the concern. However, I do not think engineering will ever be entirely without craft; rather, what precisely we craft is changing. I think the new art form is the structure that agents operate within, and I suspect creating these will be no less challenging or rewarding than anything we’ve done before.