cc-libs/docs/adrs/adr-0014-prefer-eventloop-settimeout-over-os-sleep.md

88 lines
3.9 KiB
Markdown

# ADR 0014: Prefer `eventloop.setTimeout` Over `os.sleep` In Application Code
## Status
Accepted
## Date
2026-06-09
## Context
[ADR-0002](adr-0002-use-eventloop-for-async-code.md) made [`apis/eventloop.lua`](../../apis/eventloop.lua) the
default substrate for async behavior. The eventloop drives a single
`os.pullEventRaw` loop and dispatches every event to registered handlers and
timer callbacks.
`os.sleep` looks innocent but breaks that contract. Its CC:Tweaked
implementation yields the coroutine via `os.pullEvent("timer")`. While the
sleep is in flight:
- The enclosing `os.pullEventRaw` of the eventloop is paused; nothing else
runs in that coroutine.
- Non-`timer` events that arrive are silently discarded by `os.pullEvent`
so handlers registered through the eventloop miss them entirely.
- Even `eventloop.setTimeout` callbacks scheduled before the sleep cannot
fire until the sleep returns, because their timer events are consumed by
the sleep filter.
This bit `apis/libai.lua` `pollMessage`, which used a sleep-based throttle
between HTTP polls. The function looked synchronous and stand-alone, but
the moment a caller invoked `libai.ask` from inside a handler — exactly the
composition [ADR-0002](adr-0002-use-eventloop-for-async-code.md)
encourages — the whole event loop froze for the duration of every poll
interval.
[`apis/net.lua`](../../apis/net.lua) `sendRequest` already shows the right
pattern: create a private eventloop, schedule the wait through
`setTimeout`, then `runLoop` until the work resolves. From the caller's
perspective the function is still synchronous; internally, the dispatcher
of timer events stays alive and arbitrary other handlers can be composed
around it via `parallel.waitForAll`.
## Decision
In library, server, and program code that may run inside an eventloop
(directly or transitively), use `eventloop.setTimeout` for any waiting,
throttling, polling, or retry-with-delay. Libraries that need to temporize
must take an eventloop factory through their constructor (the way
`apis/net` does) rather than baking a hardcoded sleep call.
`os.sleep` remains acceptable only in narrow cases:
1. One-shot programs that are purely sequential and register no event
handlers — a `programs/foo.lua` that prints, sleeps, prints again, and
exits.
2. `parallel.waitForAny(task, function() sleep(t); end)` used as an
isolated guard to bound an inner task (e.g. the AI Lua-exec sandbox in
`apis/libai.lua` and the `parallel.waitForAny`-driven per-case timer in
`apis/libtest.lua`). The guard sleep is private to its own coroutine
group; it does not block anything external.
3. Tests that are themselves driven by `libtest`'s per-case timeout (see
[ADR-0009](adr-0009-layered-test-timeouts.md)).
New code must not expose a `sleep` injection point on its constructor. If
a wait is needed, accept an `eventloop` factory and schedule through
`setTimeout`. Tests substitute a synchronous deterministic eventloop fake
the same way they substitute `http` or `settings`.
## Consequences
- Slightly more ceremony in "synchronous-looking" functions that wait: a
private eventloop plus a small `attempt`/`finish` pair. The benefit is
that the function composes cleanly with any caller's eventloop.
- Test fakes shift from a `sleep` stub to a synchronous eventloop double.
Ergonomics are comparable; the eventloop fake additionally lets tests
observe `pending` and `stopped` state, catching leaks the sleep stub
would have missed.
- Existing call sites are migrated opportunistically when they cause
observable bugs. The first migration is `apis/libai.lua`.
## References
- [ADR-0002](adr-0002-use-eventloop-for-async-code.md) — use eventloop for async code.
- [ADR-0009](adr-0009-layered-test-timeouts.md) — layered test timeouts (the `parallel.waitForAny` guard exception).
- [`apis/net.lua`](../../apis/net.lua) `sendRequest` — canonical private-eventloop pattern.
- [`apis/libai.lua`](../../apis/libai.lua) `pollMessage` — first migration.