hey, ishaan here (kartik's cofounder). this post came out of a lot of back-and-forth between us trying to pin down what people actually mean when they say "async agents."
the analogy that clicked for me was a turn-based telephone call—only one person can talk at a time. you ask, it answers, you wait. even if the task runs for an hour, you're waiting for your turn.
we kept circling until we started drawing parallels to what async actually means in programming. using that as the reference point made everything clearer: it's not about how long something runs or where it runs. it's about whether the caller blocks on it.
that's the user-facing definition but the implementation distinction matters more.
"takes longer than you're willing to wait" describes the UX, not the architecture. the engineering question is: does the system actually free up the caller's compute/context to do other work, or is it just hiding a spinner?
nost agent frameworks i've worked with are the latter - the orchestrator is still holding the full conversation context in memory, burning tokens on keep-alive, and can't actually multiplex. real async means the agent's state gets serialized, the caller reclaims its resources, and resumption happens via event - same as the difference between setTimeout with a polling loop vs. actual async/await with an event loop.
IMO feels sorta like Simon Willison's definition of agents. "LLMs in a loop with a goal" feels super obvious, but not sure if I would have described it that way in hindsight
Maybe, but that's what I thought while reading the "what actually is async?" part of the post, so I don't think I got biased towards the answer by that point.
One nuance that helps: “async” in the turn-based-telephone sense (you ask, it answers, you wait) is only one way agents can run.
Another is many turns inside a single LLM call — multiple agents (or voices) iterating and communicating dozens or hundreds of times in one epoch, with no API round-trips between them.
That’s “speed of light” vs “carrier pigeon”: no serialization across the boundary until you’re done. We wrote this up here: Speed of Light – MOOLLM (the README has the carrier-pigeon analogy and a 33-turn-in-one-call example).
Speed of Light vs Carrier Pigeon:
The fundamental architectural divide in AI agent systems.
The Core Insight: There are two ways to coordinate multiple AI agents:
Carrier Pigeon
Where agents interact: between LLM calls
Latency: 500 ms+ per hop
Precision: degrades each hop
Cost: high (re-tokenize everything)
Speed of Light
Where agents interact: during one LLM call
Latency: instant
Precision: perfect
Cost: low (one call)
MCP = Carrier Pigeon
Each tool call:
stop generation →
wait for external response →
start a new completion
N tool calls ⇒ N round-trips
MOOLLM Skills and agents can run at the Speed of Light. Once loaded into context, skills iterate, recurse, compose, and simulate multiple agents — all within a single generation. No stopping. No serialization.
i just imagine it as the swap between "human watching agent while it runs"
vs "agent runs for a long time, tells the user over human interfaces when its done" eg. sends a slack. or something like gemini deep research.
an extension would be that they are triggered by events and complete autonomously with only human interfaces when it gets stuck.
theres a bit of a quality difference rather than exactly functionally, in that the agent mostly doesnt need human interaction beyond a starting prompt, and a notification of completion or stuckness. even if im not blocking on a result, it cant immediately need babying or i cant actually leave it alone
the analogy that clicked for me was a turn-based telephone call—only one person can talk at a time. you ask, it answers, you wait. even if the task runs for an hour, you're waiting for your turn.
we kept circling until we started drawing parallels to what async actually means in programming. using that as the reference point made everything clearer: it's not about how long something runs or where it runs. it's about whether the caller blocks on it.