Program-as-Weights: Compile-Once Adapter Paradigm Matches 32B Models at 1/50 the Memory

Research official 1 src. ~1 min

Researchers from the University of Waterloo introduce Program-as-Weights (PAW), where a 4B-parameter compiler generates small reusable adapter weights for tasks that resist rule-based solutions. A 0.6B Qwen3 interpreter guided by these adapters matches a 32B model while using 1/50th the inference memory and running at 30 tokens/second on a MacBook M3. The authors also release FuzzyBench, a 10-million-example training dataset.

Why it matters

PAW reframes foundation model usage from per-input inference to a compile-once, run-many pattern. The 50x memory reduction enables frontier-quality task performance on consumer hardware.

Importance: 3/5

92 HF Daily Papers upvotes (July 3, highest of the day); 50× memory reduction vs 32B model; 30 tok/sec on MacBook M3

efficiency inference language-models on-device

Sources

official Program-as-Weights: A Programming Paradigm for Fuzzy Functions — arxiv