Note: This project is still experimental, and the API is subject to significant changes.
ASTro is an optimization framework based on Abstract Syntax Trees (ASTs). It provides a reusable infrastructure for generating optimized code fragments through partial evaluation of AST interpreters, emitting C source as the output of specialization and delegating native code generation to a mature C compiler (gcc/clang).
The companion tool ASTroGen (lib/astrogen.rb) automatically generates an interpreter (evaluator), a dispatcher, allocators, Merkle-tree hash functions, a partial evaluator (specializer), a dumper, and node-replacement helpers from a node.def file that defines node types and their behaviors in C code.
- Dispatcher / Evaluator separation. Evaluators (
EVAL_xxx) hold the user-written semantics; dispatchers (DISPATCH_xxx) are thin wrappers that unpack node fields and call the evaluator. Partial evaluation specializes only the dispatchers, so evaluator code is reused unchanged and the C compiler is free to inline the resulting call chain. - Merkle-tree hashing of AST nodes. Each node carries a hash derived from its kind and children, enabling content-addressable sharing of compiled code across processes and machines (functions are named
SD_<hash>). - C source as the IR. Specialized code is emitted as ordinary C, compiled with the host toolchain, and loaded via shared objects (
dlopen+dlsym). No custom backend. - Code Store (
runtime/astro_code_store.{h,c}). A small runtime library that manages<store>/SD_<hash>.{c,o}files plus an aggregatedall.so, and swaps a node's dispatcher to the specialized one onastro_cs_load.
The same node.def and the same generated infrastructure support:
- Plain interpreter — pure tree-walking via
DISPATCH→EVAL. - AOT compilation — specialize the whole AST offline, link, and run.
- Profile-guided compilation — collect profile on the first run, specialize before the second.
- JIT compilation — specialize and load
.sofiles at run time. A tiered design (in-process L0 thread, local L1 daemon, remote L2 compile farm) shares compiled code by hash.
See docs/idea.md for the design rationale and docs/usage.md for an ASTroGen tutorial.
lib/astrogen.rb ASTroGen core (Ruby) — generates C from node.def
runtime/ Reusable C runtime (Code Store)
sample/ Sample languages — see § Samples below
docs/ Design notes and papers
ASTro samples span a wide range of language families to exercise the framework against very different value representations, control-flow shapes, and runtime services. All share a uniform layout (node.def, Makefile, optional ASTroGen extension, per-sample docs/). Each sample's own README has the full language scope, build / run, benchmarks, and design notes; docs/samples.md is the cross-sample analysis. The entries below are one-liners with the most distinctive flagship result.
Tutorial.
calc— toy 6-node calculator REPL (num++/-/*///%), the smallest end-to-end ASTroGen example.
Ruby family.
naruby— not a Ruby: tiny integer-only Ruby subset (32 nodes). The original ASTro paper's vehicle and the only sample exercising all four execution modes (plain / AOT / PG / JIT) from a single binary.abruby— a bit Ruby: larger Ruby subset as a CRuby C extension (107 nodes), reusing CRuby'sVALUE/ Prism / GC. PGC-baked optcarrot 86.5 fps vs CRuby (no-JIT) 45.6 fps; integer-loop microbenches 4–8×.koruby— kind of Ruby: standalone (non-CRuby) Ruby with Boehm GC + GMP + Prism (119 nodes). Runs optcarrot end-to-end at ~100 fps (gcc-15 -O3, vs CRuby 42 fps ≈ 2.4×; YJIT 175 fps).
Other dynamic scripting.
luastro— Lua 5.4 (74 nodes) with tagged 8-byteLuaValue. Full pattern matcher, weak tables,__gcfinalizers, ucontext coroutines, self-written mark-sweep GC.jstro— JavaScript (broad ES2023 subset, 101 nodes) with V8-style hidden-class objects + shape-transition ICs. Profile-driven kind swap, longjmpthrow, mark-sweep GC; beats node v18 by 2–50× on several benches.pystro— Python 3 subset (95 nodes): GMP bignum + classes + dunder +try/except+ generators + comprehensions + f-strings + ~37 builtins. Beats CPython 3.12 on 13 / 14 benches (4 macro + 9/10 micro) at ~1 s scale (while_loop17×); also beats CPython 3.14+JIT on 12/14.
Functional / academic.
ascheme— R5RS Scheme (54 nodes) with the full numeric tower (fixnum / bignum / rational / flonum / complex via GMP),call/cc, multiple values, ports. Passes 179/179 of chibi'sr5rs-tests.scm; vs chibi 18/18 wins, vs guile 17/18 (up to 27×).astocaml— OCaml subset (91 nodes) with HM-lite type inference, ADTs, exceptions, single-inheritance classes, real functor instantiation, TCO. 35/35 tests; with-c, fib / ack / tak beatocamlcbytecode andocamltoplevel.asom— SOM Smalltalk dialect (80 nodes) with type-specialized sends, control-flow inlining, Boehm GC + GMP. Passes the full SOM TestSuite (221/221); PG mode beats SOM++ on all 12 AreWeFastYet benches.
Statically-typed imperative.
pascalast— Pascal subset (159 nodes), ISO 7185 + Free Pascal–style OO: variant records, sets,with,array of T,virtual/override,try/except, properties. Wins outright on tight constant-folding loops vsfpc -O3(collatz 0.4× / leibniz 0.6× / mandelbrot 0.8×).castro— C subset (101 nodes) with tree-sitter-c front-end, 8-byte slot VALUE, structs / function pointers /printf,gcc -Epreprocessing. AOT beatsgcc -O0on tight loops;crc32tiesgcc -O1.
Stack-based.
aforth— Forth subset (68 nodes) where every word is an AST NODE (no traditional threaded code); calls indirect through aword_id→NODE *table. Wins 8/9 against gforth 0.7.3 (gcd 13.9×, factorial 8.1×, collatz 7.2×; ack 0.95× the only loss).
Data / vector.
astr— R subset (46 nodes) with tagged 64-bitVALUE, libgc, vector/scalar broadcast,c()/paste/substr/1:nranges. AOT 4.1× on fib(36), 3.8× on ack(3,9).
Non-source / DSL.
wastro— WebAssembly 1.0+ (MVP) interpreter (212 nodes). Reads both.watand.wasm, runs the wasm spec-test.wastharness; AOT-cached within 3–6× ofgcc -O2on tight loops.astrogre— Ruby/Onigmo-compatible regex engine (53 nodes — the matcher itself is an AST) with agrep-style CLI (are); switchable backend (astrogre / Onigmo). AOT fuses the start-position scan loop and the regex chain into one SD (8/8 wins vs Onigmo, 3–15×).nuq—jq1.7-compatible clone (209 nodes) with full pipeline /try-catch/reduce/foreach/ module / call-by-name / 70+ builtins. Passes 524/526 of jq 1.7's official test suite; real-world 11/11 wins 2.6–5.0× vs jq (microupto50×,reverse16×).arjsv— JSON Schema validator (drafts 04 / 06 / 07 / 2020-12, 47 nodes), CRuby C extension,json_schemer-compatible API, per-schema SD with alloc-freevalid?. Test Suite (excl. IDNA / HTTP$ref/ dynamic$dynamicRef): 04 98.8% / 06 98.9% / 07 95.5% / 2020-12 94.6%. 4–11× vsrj_schema(Rust+RapidJSON FFI), 25–180× vsjson_schemer(pure Ruby).