#training
- Mean Mode Screaming: Training Pathology Fix Enables 1000-Layer Diffusion Transformers research
- Prime Intellect Releases prime-rl v0.6.0 for Agentic RL on Trillion-Parameter MoE Models Prime Intellect research
- Model Spec Midtraining: How Normative Self-Knowledge Improves Alignment Generalization Anthropic research
- Quantized Reasoning Models Think They Need to Think Longer, but They Do Not Meta research
- TrOPD: Trust-Region On-Policy Distillation Stabilizes LLM Training When Teacher-Student Gap Is Large Samsung Research research
- FORT-Searcher: Shortcut-Resistant Training Data Framework for Deep Search Agents research
- DomainShuttle: Subject-Driven Text-to-Video Across In-Domain and Cross-Domain Scenarios research
- QUBRIC: Co-Designing Queries and Rubrics Extends RLVR to Open-Ended Reasoning Domains research
- ZPPO: Teacher-in-Prompts Knowledge Distillation Outperforms Gradient Methods for Small Reasoners NVIDIA research