5.2 KiB
ADR 0015: Unified Boot Eventloop and Service-Name Bus
Status
Accepted
Date
2026-06-09
Context
Before this change, the TrapOS networking and boot model had accumulated several issues that were hard to fix incrementally:
- Label collision was a silent footgun. Two machines sharing the same label both accepted and rebroadcast packets addressed to that label, since
programs/router.lua:40andapis/net.lua:35both matchedos.getComputerLabel()independently. The sender received duplicate responses and the wire carried duplicate retransmits. _G.isRouterEnabledmutated send behavior across the codebase.apis/net.luasendRawswitched its transmit path based on a global flag set by the router program. This made the same function call mean different things depending on which machine ran it.- Every autostart server ran its own private eventloop. Each server file called
net.start()which delegated toel.startLoop(). With N autostart entries,parallel.waitForAllran N coroutines, each pumping an independentos.pullEventRaw. Wasteful and conceptually awkward: events were broadcast to every coroutine but only one would have a relevant handler. - The router lived outside the eventloop entirely.
programs/router.luawas a hand-rolledwhile true / os.pullEventloop, structurally different from every other long-lived process in the repo. - Channel numbers leaked into every client.
servers/ping-server.luaandprograms/ping.luaboth duplicated aPING_CHANNEL = 9constant; there was no service registry. Adding a new service meant picking a free integer and replicating it on both ends.
Net's blast radius was small at this point — only programs/ping.lua and servers/ping-server.lua consumed it — so a clean break was cheaper than incremental patching.
Decision
Adopt three coordinated changes:
-
One boot eventloop per machine.
startup/servers.luacreates a singlecreateEventLoop()instance, stores it at_G.bootEventLoop, runs autostart server files (which register handlers and return without blocking), then runsparallel.waitForAny(shellFn, eventLoopFn). The shell and the eventloop are the only two coroutines. -
Service-name addressing on a single bus channel.
apis/net.luaexposesnet.serve(name, handler),net.call(name, payload, opts),net.send(name, payload, opts), andnet.listen(name, handler). All traffic flows on channel10and is demultiplexed inside the packet body via aservicefield. Channel numbers stop being a public concept.require('/apis/net')()returns a singleton bound to_G.bootEventLoopwhen present, otherwise an ephemeral instance. -
Router as a service on the boot eventloop.
programs/router.luaregisters handlers on the same boot eventloop everything else uses. It owns a TTL-based label map (extracted intoapis/librouter.luafor testability). Machines with a label autostartservers/net-registrar.lua, which periodically broadcasts(id, label)so the router can resolve label-addressed packets. Duplicate label registrations are rejected with a printed warning._G.isRouterEnabledis gone; the router service flips a local flag vianet.setRouter(true)instead.
CLI programs stay standalone: net.call internally uses os.pullEvent with a timer, so programs do not need the boot eventloop to receive a response.
Consequences
- Adding a new networked service is now: write a
servers/foo.luathat callsnet.serve('foo', handler)and returns, then add it to a package'sautostart. No channel allocation, no.start()blocking call. - The router program returns immediately instead of blocking the shell. Users type
routeronce on the chosen machine and continue using the shell. - Label collisions are detected and rejected at registration time, with a clear warning, instead of causing silent duplicate delivery.
- The ping API surface changed (
net.sendRequest→net.call,net.listenRequest→net.serve). Out-of-tree consumers — if any existed — would need to migrate. Inside the repo only ping needed migration. - Programs that need to wait for events still work by direct
os.pullEvent, but if a program registers a long-lived handler on_G.bootEventLoopand exits, the handler keeps firing with a stale closure. Programs should prefercall/sendoverserve/listen. This is documented inapis/net.luabut not enforced. - Tests for the router state machine live in
tests/router.luaand exerciseapis/librouter.luawith an injected clock. Tests for the net packet shape and dispatch live intests/net.luawith a fake modem.
Out of Scope
- Multi-router topologies. The single-router assumption stays; a network is expected to run
routeron exactly one machine. - Retry and acknowledgement primitives beyond the existing per-call
timeout. - Unifying
libtui,libai, andtuidemoeventloops. They remain private; they are presentation/AI concerns, not network plumbing. - The
ccpmpackage manager. It is recent, tested, and not in pain.