cc-libs/docs/adrs/adr-0014-prefer-eventloop-settimeout-over-os-sleep.md

3.9 KiB

ADR 0014: Prefer eventloop.setTimeout Over os.sleep In Application Code

Status

Accepted

Date

2026-06-09

Context

ADR-0002 made apis/eventloop.lua the default substrate for async behavior. The eventloop drives a single os.pullEventRaw loop and dispatches every event to registered handlers and timer callbacks.

os.sleep looks innocent but breaks that contract. Its CC:Tweaked implementation yields the coroutine via os.pullEvent("timer"). While the sleep is in flight:

  • The enclosing os.pullEventRaw of the eventloop is paused; nothing else runs in that coroutine.
  • Non-timer events that arrive are silently discarded by os.pullEvent — so handlers registered through the eventloop miss them entirely.
  • Even eventloop.setTimeout callbacks scheduled before the sleep cannot fire until the sleep returns, because their timer events are consumed by the sleep filter.

This bit apis/libai.lua pollMessage, which used a sleep-based throttle between HTTP polls. The function looked synchronous and stand-alone, but the moment a caller invoked libai.ask from inside a handler — exactly the composition ADR-0002 encourages — the whole event loop froze for the duration of every poll interval.

apis/net.lua sendRequest already shows the right pattern: create a private eventloop, schedule the wait through setTimeout, then runLoop until the work resolves. From the caller's perspective the function is still synchronous; internally, the dispatcher of timer events stays alive and arbitrary other handlers can be composed around it via parallel.waitForAll.

Decision

In library, server, and program code that may run inside an eventloop (directly or transitively), use eventloop.setTimeout for any waiting, throttling, polling, or retry-with-delay. Libraries that need to temporize must take an eventloop factory through their constructor (the way apis/net does) rather than baking a hardcoded sleep call.

os.sleep remains acceptable only in narrow cases:

  1. One-shot programs that are purely sequential and register no event handlers — a programs/foo.lua that prints, sleeps, prints again, and exits.
  2. parallel.waitForAny(task, function() sleep(t); end) used as an isolated guard to bound an inner task (e.g. the AI Lua-exec sandbox in apis/libai.lua and the parallel.waitForAny-driven per-case timer in apis/libtest.lua). The guard sleep is private to its own coroutine group; it does not block anything external.
  3. Tests that are themselves driven by libtest's per-case timeout (see ADR-0009).

New code must not expose a sleep injection point on its constructor. If a wait is needed, accept an eventloop factory and schedule through setTimeout. Tests substitute a synchronous deterministic eventloop fake the same way they substitute http or settings.

Consequences

  • Slightly more ceremony in "synchronous-looking" functions that wait: a private eventloop plus a small attempt/finish pair. The benefit is that the function composes cleanly with any caller's eventloop.
  • Test fakes shift from a sleep stub to a synchronous eventloop double. Ergonomics are comparable; the eventloop fake additionally lets tests observe pending and stopped state, catching leaks the sleep stub would have missed.
  • Existing call sites are migrated opportunistically when they cause observable bugs. The first migration is apis/libai.lua.

References

  • ADR-0002 — use eventloop for async code.
  • ADR-0009 — layered test timeouts (the parallel.waitForAny guard exception).
  • apis/net.lua sendRequest — canonical private-eventloop pattern.
  • apis/libai.lua pollMessage — first migration.