hey, ishaan here (kartik's cofounder). this post came out of a lot of back-and-f...

stavros · 2026-02-10T00:03:19 1770681799

Not to be all captain hindsight, but I was puzzled as I was skimming the post, as this seemed obvious to me:

Something is async when it takes longer than you're willing to wait without going off to do something else.

tiny-automates · 2026-02-10T03:02:59 1770692579

that's the user-facing definition but the implementation distinction matters more.

"takes longer than you're willing to wait" describes the UX, not the architecture. the engineering question is: does the system actually free up the caller's compute/context to do other work, or is it just hiding a spinner?

nost agent frameworks i've worked with are the latter - the orchestrator is still holding the full conversation context in memory, burning tokens on keep-alive, and can't actually multiplex. real async means the agent's state gets serialized, the caller reclaims its resources, and resumption happens via event - same as the difference between setTimeout with a polling loop vs. actual async/await with an event loop.

isehgal · 2026-02-10T07:33:09 1770708789

yes you got the point here

cmsparks · 2026-02-10T00:50:57 1770684657

IMO feels sorta like Simon Willison's definition of agents. "LLMs in a loop with a goal" feels super obvious, but not sure if I would have described it that way in hindsight

stavros · 2026-02-10T00:54:52 1770684892

Maybe, but that's what I thought while reading the "what actually is async?" part of the post, so I don't think I got biased towards the answer by that point.

DonHopkins · 2026-02-10T01:15:06 1770686106

One nuance that helps: “async” in the turn-based-telephone sense (you ask, it answers, you wait) is only one way agents can run.

Another is many turns inside a single LLM call — multiple agents (or voices) iterating and communicating dozens or hundreds of times in one epoch, with no API round-trips between them.

That’s “speed of light” vs “carrier pigeon”: no serialization across the boundary until you’re done. We wrote this up here: Speed of Light – MOOLLM (the README has the carrier-pigeon analogy and a 33-turn-in-one-call example).

Speed of Light vs Carrier Pigeon: The fundamental architectural divide in AI agent systems.

https://github.com/SimHacker/moollm/blob/main/designs/SPEED-...

The Core Insight: There are two ways to coordinate multiple AI agents:

  Carrier Pigeon
    Where agents interact: between LLM calls
    Latency: 500 ms+ per hop
    Precision: degrades each hop
    Cost: high (re-tokenize everything)
  Speed of Light
    Where agents interact: during one LLM call
    Latency: instant
    Precision: perfect
    Cost: low (one call)
  MCP = Carrier Pigeon
    Each tool call:
      stop generation → 
      wait for external response → 
      start a new completion
    N tool calls ⇒ N round-trips

MOOLLM Skills and agents can run at the Speed of Light. Once loaded into context, skills iterate, recurse, compose, and simulate multiple agents — all within a single generation. No stopping. No serialization.

canadiantim · 2026-02-10T05:28:02 1770701282

Thanks for sharing, that distinction is very helpful.

8note · 2026-02-10T00:21:37 1770682897

i just imagine it as the swap between "human watching agent while it runs"

vs "agent runs for a long time, tells the user over human interfaces when its done" eg. sends a slack. or something like gemini deep research.

an extension would be that they are triggered by events and complete autonomously with only human interfaces when it gets stuck.

theres a bit of a quality difference rather than exactly functionally, in that the agent mostly doesnt need human interaction beyond a starting prompt, and a notification of completion or stuckness. even if im not blocking on a result, it cant immediately need babying or i cant actually leave it alone