TIDE: cross-architecture distillation for diffusion LLMs
Peking University
TIDE is a distillation framework that transfers knowledge between different architectures for diffusion LLMs. It comprises three components: TIDAL (adaptive distillation strength by timestep), CompDemo (context via mask splitting), and Reverse CALM (cross-tokenizer objective). Teachers are a dense 8B and a 16B MoE; the student is a 0.6B diffusion model; the student's HumanEval score is 48.78 versus 32.3 for an AR baseline of the same size.
Why it matters
Diffusion LLMs remain a marginal but actively growing alternative to autoregressive models. Cross-architecture distillation from a dense teacher → MoE → diffusion student is a rare combination, and the notable jump on code benchmarks at 0.6B parameters makes the idea practically interesting for on-device.
Importance: 2/5
A narrow research direction, with no clear upvote signals on HF Daily.