TIDE: cross-architecture distillation for diffusion LLMs

Peking University

Research official + media 2 src. ~1 min

TIDE is a distillation framework that transfers knowledge between different architectures for diffusion LLMs. It comprises three components: TIDAL (adaptive distillation strength by timestep), CompDemo (context via mask splitting), and Reverse CALM (cross-tokenizer objective). Teachers are a dense 8B and a 16B MoE; the student is a 0.6B diffusion model; the student's HumanEval score is 48.78 versus 32.3 for an AR baseline of the same size.

Why it matters

Diffusion LLMs remain a marginal but actively growing alternative to autoregressive models. Cross-architecture distillation from a dense teacher → MoE → diffusion student is a rare combination, and the notable jump on code benchmarks at 0.6B parameters makes the idea practically interesting for on-device.

Importance: 2/5

A narrow research direction, with no clear upvote signals on HF Daily.

inference paper china

Sources

official Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models — arXiv

media TIDE — HF Daily Papers