A CoffeeScript/Node.js ML pipeline system by James A. Hinds — comprising pipeline_runner, recipes, step scripts, and meta device infrastructure
The writeStory pipeline is a complete ML system for fine-tuning, retrieval, and emotionally-directed narrative generation — built on a micro-operating-system for ML research called pipeline_runner.coffee. The system encompasses the runner itself, YAML recipes that define pipeline DAGs, step scripts that encapsulate research logic, and a meta device layer that handles transparent persistence to filesystem, YAML, JSON, and SQLite.
Its central insight is that a research notebook — with its cells, shared variables, manual reruns, and implicit dependencies — can be translated step-for-step into a crash-recoverable, parallelism-capable, auditable production system. Each notebook cell becomes a pipeline step. Cell execution order becomes a DAG. Shared kernel variables become a promise-backed in-memory store. Manual reruns become restart_here. The result is infrastructure that lets a researcher move from proof-of-concept to reliable, unattended execution without rewriting their logic — only hardening it.
A promise-backed key-value store with meta-rule dispatch. Writes to keys like out/result.yaml transparently serialize to disk. Reads block on a Promise until a value arrives. The entire data flow of the pipeline is expressed as key writes and reads — no polling, no explicit signaling.
Steps declare depends_on, needs, and makes. The runner topologically sorts them, wires artifact resolution, and fires steps as their dependencies resolve. Parallel execution is free — no threading ceremony required.
Each step script receives a ledger object exposing param, need, peek, make, callMLX, done, and fail. Steps never touch the filesystem or Memo directly. The contract is explicit and enforceable.
One JSON file per step in state/. Status is running, done, or failed. restart_here consumed at startup clears all downstream state so old completions cannot inhibit reruns. Crash recovery is structural, not bolted on.
Pattern-matched middleware intercepts Memo writes by key name. A write to out/foo.yaml serializes an object to YAML and writes the file. A write to a SQLite key persists to the database. Persistence is a side effect of data flow — steps remain unaware of storage.
All heavy computation — quantization, LoRA fine-tuning, inference — is handed to MLX via callMLX. The orchestration layer stays in V8. The boundary is clean: everything that can be CoffeeScript is; only what must be Python crosses over.
restart_here and downstream-delete protocol shows genuine operational maturity. A 40-minute quantization step failing downstream is a non-event. In a notebook it is lost time.
joy, grief, anxiety, etc. — currently lives in four separate locations across the codebase. It is load-bearing for retrieval correctness and will drift. A single authoritative source is the right fix when the design settles.
kag_entries at index time by the oracle step. Diary generation reads chunk_text from SQLite rather than recomputing group boundaries from raw story text. The divergence risk is eliminated and the retrieval path is simpler.