8.2 KiB
ADR 0002: Eventloop Substrate, Service Bus, and Async Discipline
Status
Accepted
Date
2026-06-07
Context
ComputerCraft is event-driven. Direct os.pullEvent loops are easy to write but hard to compose when multiple things need to happen at the same time. Without a single substrate the repo accumulated several distinct problems:
- Each long-lived process owned a private event loop, including the router (
programs/router.luawas a hand-rolledwhile true / os.pullEvent). With N autostart servers,parallel.waitForAllran N coroutines each pumping an independentos.pullEventRaw. Events were broadcast to every coroutine but only one would have a relevant handler — wasteful and conceptually awkward. _G.isRouterEnabledmutated send behavior across the codebase.apis/net.luasendRawswitched its transmit path based on a global flag set by the router program, so the same function call meant different things depending on which machine ran it.- Channel numbers leaked into every client.
servers/ping-server.luaandprograms/ping.luaboth duplicated aPING_CHANNELconstant; there was no service registry. Adding a new service meant picking a free integer and replicating it on both ends. - Label collision was a silent footgun. Two machines sharing the same label both accepted and rebroadcast packets addressed to that label, producing duplicate responses and duplicate retransmits.
os.sleeplooked innocent but broke the substrate. Its CC:Tweaked implementation yields viaos.pullEvent("timer"). While the sleep is in flight, the enclosing eventloop'sos.pullEventRawis paused; non-timerevents are silently discarded; eveneventloop.setTimeoutcallbacks scheduled before the sleep cannot fire until it returns. This bitapis/libai.luapollMessage, which used a sleep-based throttle and froze the whole loop the moment a caller invoked it from inside a handler.
Net's blast radius at the time of the bus rewrite was small (only ping consumed it), so a clean break was cheaper than incremental patching.
Decision
1. Eventloop is the async substrate
New async code uses apis/eventloop.lua. Event handlers, timers, server listeners, and UI behavior compose through the eventloop instead of each feature owning its own blocking loop.
- Prefer
eventloop.register,setTimeout,onStart,onStop, andstartLoopfor async behavior. - APIs that listen for events accept an existing event loop as a constructor argument, the way
apis/net.luadoes. Do not create a private loop inside a module. - Direct
os.pullEventloops should be rare and justified (CLI programs waiting for a single reply are the main exception). - A handler that returns
api.STOPauto-unregisters.
2. One boot eventloop and a service-name bus
startup/servers.lua creates a single createEventLoop() instance, stores it at _G.bootEventLoop, runs autostart server files (which register handlers and return without blocking), then runs parallel.waitForAny(shellFn, eventLoopFn). The shell and the eventloop are the only two coroutines.
apis/net.lua exposes a service-name bus on a single channel:
net.serve(name, handler)— register a server handler (server-side).net.call(name, payload, opts)— request/response with timeout (client-side).net.send(name, payload, opts)— fire-and-forget (client-side).net.listen(name, handler)— passive listener.
All traffic flows on channel 10 and is demultiplexed inside the packet body via a service field. Channel numbers stop being a public concept. require('/apis/net')() returns a singleton bound to _G.bootEventLoop when present, otherwise an ephemeral instance. CLI programs stay standalone: net.call internally uses os.pullEvent with a timer, so programs do not need the boot eventloop to receive a response.
programs/router.lua registers handlers on the same boot eventloop everything else uses. It owns a TTL-based label map extracted into apis/librouter.lua for testability. Machines with a label autostart servers/net-registrar.lua, which periodically broadcasts (id, label) so the router can resolve label-addressed packets. Duplicate label registrations are rejected with a printed warning. _G.isRouterEnabled is gone; the router service flips a local flag via net.setRouter(true) instead.
3. os.sleep discipline
In library, server, and program code that may run inside an eventloop (directly or transitively), use eventloop.setTimeout for any waiting, throttling, polling, or retry-with-delay. Libraries that need to temporize must take an eventloop factory through their constructor rather than baking a hardcoded sleep call. apis/net.lua sendRequest is the canonical private-eventloop pattern: create a private eventloop, schedule the wait through setTimeout, then runLoop until the work resolves — synchronous from the caller's perspective, but the dispatcher stays alive internally so handlers can compose around it via parallel.waitForAll.
os.sleep remains acceptable only in narrow cases:
- One-shot programs that are purely sequential and register no event handlers — a
programs/foo.luathat prints, sleeps, prints again, and exits. parallel.waitForAny(task, function() sleep(t); end)used as an isolated guard to bound an inner task (e.g. the AI Lua-exec sandbox inapis/libai.luaand theparallel.waitForAny-driven per-case timer inapis/libtest.lua). The guard sleep is private to its own coroutine group; it does not block anything external.- Tests that are themselves driven by
libtest's per-case timeout (see ADR-0007).
New code must not expose a sleep injection point on its constructor. If a wait is needed, accept an eventloop factory and schedule through setTimeout. Tests substitute a synchronous deterministic eventloop fake the same way they substitute http or settings.
Consequences
- Adding a new networked service is now: write a
servers/foo.luathat callsnet.serve('foo', handler)and returns, then add it to a package'sautostart. No channel allocation, no.start()blocking call. - The router program returns immediately instead of blocking the shell. Users type
routeronce on the chosen machine and continue using the shell. - Label collisions are detected and rejected at registration time, with a clear warning, instead of causing silent duplicate delivery.
- A router must still be running somewhere on the network for cross-machine label-addressed packets; without one, non-router senders produce packets with
routerId = niland consumers drop them on receive. - Programs that need to wait for events still work by direct
os.pullEvent, but if a program registers a long-lived handler on_G.bootEventLoopand exits, the handler keeps firing with a stale closure. Programs should prefercall/sendoverserve/listen. This is documented inapis/net.luabut not enforced. - Tests for the router state machine live in
tests/router.luaand exerciseapis/librouter.luawith an injected clock. Tests for the net packet shape and dispatch live intests/net.luawith a fake modem. - Slightly more ceremony in "synchronous-looking" library functions that wait: a private eventloop plus a small
attempt/finishpair. The benefit is clean composition with any caller's eventloop. - Test fakes shift from a
sleepstub to a synchronous eventloop double. Ergonomics are comparable; the eventloop fake additionally lets tests observependingandstoppedstate, catching leaks the sleep stub would have missed. - Existing call sites are migrated opportunistically when they cause observable bugs. The first
os.sleepmigration isapis/libai.lua.
Out of Scope
- Multi-router topologies. The single-router assumption stays; a network is expected to run
routeron exactly one machine. - Retry and acknowledgement primitives beyond the existing per-call
timeout. - Unifying
libtui,libai, andtuidemoeventloops. They remain private; they are presentation/AI concerns, not network plumbing.