#training-dynamics 1 item 9 июн On the Geometry of On-Policy Distillation: A Training Paradigm Distinct from SFT and RLVR Hong Kong University of Science and Technology research