#paper
- NVIDIA Releases Cosmos 3: Open Omnimodal World Foundation Model for Physical AI NVIDIA research
- GLM-5V-Turbo: a natively multimodal foundation model for agents Z.ai research
- SenseNova-U1: Open-Source Unified Multimodal Understanding and Generation via NEO-unify SenseTime research
- Recursive Multi-Agent Systems: agent communication in latent space Stanford University research
- Eywa: heterogeneous collaboration framework between LLM agents and scientific foundation models University of Illinois at Urbana-Champaign research
- Exploration Hacking: LLMs Can Be Fine-Tuned to Strategically Resist RL Training research
- OpenAI Discloses How a 2.5%-User Reward Signal Gave GPT a Goblin Obsession Across Model Generations OpenAI research
- MiniCPM-o 4.5: Real-Time Full-Duplex Omni-Modal AI on Edge Devices OpenBMB / Tsinghua University research
- AI2 Open-Sources MolmoAct2: Robotics VLA That Claims to Beat GPT-5 on Embodied Reasoning AI2 research
- UniVidX: One Diffusion Backbone for RGB, Intrinsic Maps, and RGBA Video Generation research
- OpenAI Post-Mortem: How RLHF Reward Hacking Embedded Goblin Metaphors in GPT-5.x OpenAI research
- RubricEM: Meta-RL with Rubric-Guided Policy Decomposition Beyond Verifiable Rewards Google research
- Asymmetric Flow Models: SOTA 1.57 FID on ImageNet via Rank-Asymmetric Velocity Parameterization Stanford University research
- Humanoid-GPT: Scaling to 2B Motion Frames Enables Zero-Shot Generalization in Humanoid Control research
- MLEvolve: Self-Evolving Multi-Agent LLM Framework for Automated ML Algorithm Discovery research
- MiniMax Sparse Attention: 28× Compute Reduction at 1M-Token Context with No Quality Loss MiniMax research
- MaxProof: MiniMax Model Exceeds IMO and USAMO Gold-Medal Thresholds on Formal Math MiniMax research
- Learning while Deploying: Fleet-Scale Reinforcement Learning Turns Robot Deployment into Continuous Training AGIBot research
- Ctx2Skill: Self-Improving Framework for Autonomous Context-Skill Discovery in LLMs research
- RLDX-1: Multi-Stream Action Transformer Achieves 86.8% on ALLEX Humanoid Tasks RLWRLD research
- AI Co-Mathematician: Google DeepMind Achieves 48% on FrontierMath Tier 4 Google DeepMind research
- OpenSearch-VL: Open Recipe for Training Frontier Multimodal Search Agents Tencent Hunyuan research
- ARIS: Autonomous ML Research via Adversarial Multi-Agent Collaboration Shanghai Jiao Tong University research
- Crafter: Multi-Agent Harness for Editable Scientific Figure Generation Scores +16pt Over Baselines (103 HF Upvotes) Tsinghua University research
- GrepSeek: Training Search Agents for Direct Corpus Interaction via Shell Commands (93 HF Upvotes) University of Massachusetts Amherst research
- Echo-Infinity: Real-Time Infinite Video Generation via Learnable Memory Query research
- ThoughtFold: Introspective Preference Learning Cuts Reasoning Tokens by 56% Without Accuracy Loss research
- The Deterministic Horizon: Information-Theoretic Proof That Extended CoT Fails and Tool Use Is Necessary research
- The Self-Correction Illusion: LLMs Fix Others' Errors but Not Their Own — Role Labels Are the Cause research
- Audio Interaction Model: Unified Streaming Framework Combining Offline and Real-Time Audio Instruction Following research
- Agentic Transformers Provably Learn Depth-First Search via Reinforcement Learning Carnegie Mellon University / Ohio State University research
- EvoArena: LLM Agents Score Only 40% on Dynamic Evolving Environments MIT / NUS / Salesforce research
- WeaveBench: Computer-Use Agents Fail at Hybrid GUI+CLI Tasks — 41% Pass Rate Microsoft Research research
- InterleaveThinker: RL Planner+Critic Pipeline for Interleaved Text-and-Image Generation CUHK Multimedia Lab research
- DreamX-World 1.0: General-Purpose Interactive World Model with 6DoF Camera Control AMAP-ML (Alibaba Maps AI Lab) research
- FastContext: Specialized Exploration Subagent Cuts Coding Agent Token Usage by 60% Microsoft / Shanghai Jiao Tong University research
- SAE Interventions Are Unreliable: Suppressed Behaviors Recover Post-Intervention Hong Kong Polytechnic University research
- TIDE: cross-architecture distillation for diffusion LLMs Peking University research
- Programming with Data: test-driven data engineering for self-improving LLMs OpenDataLab research
- ESamp: LLMs explore by latent distilling for semantic-novelty sampling ShanghaiTech University research
- CoPD: co-evolving policy distillation for unified multi-capability models research
- Odysseus: Training VLMs for 100+ Turn Interactive Decision-Making via RL Princeton University research
- Meta Publishes Preparedness Report for Code World Model Before Open-Weight Release Meta research
- World Action Models: First Systematic Survey of Embodied Foundation Models Unifying World Modeling and Action OpenMOSS research
- AnyFlow: Any-Step Video Diffusion with On-Policy Flow Map Distillation MIT / NVIDIA research
- TrOPD: Trust-Region On-Policy Distillation Stabilizes LLM Training When Teacher-Student Gap Is Large Samsung Research research
- Do Language Models Need Sleep? Offline Recurrence as Memory Consolidation for Improved Inference Google / CMU research
- InterleaveThinker: RL Framework for Agentic Text-and-Image Interleaved Generation research
- EvoArena: LLM Agents Score Only 39.6% on Dynamic Evolving Environments Benchmark MIT research
- FORT-Searcher: Shortcut-Resistant Training Data Framework for Deep Search Agents research
- Astra: RL-Trained VLM Queries World Simulator for Spatial Reasoning research
- Intern-Atlas: 1M-Paper Methodology Evolution Graph as Research Infrastructure for AI Scientists research
- HeavySkill: Internalizing Heavy Thinking as a Trainable Agentic Skill via RL research
- LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents Shanghai Jiao Tong University research
- Executable World Models for ARC-AGI-3: Coding-Agent Approach Without Game-Specific Logic research
- Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and a Fix research
- Direct Corpus Interaction: Rethinking Retrieval for Agentic Search TIGER-Lab research
- Cola DLM: Continuous Latent Diffusion Language Model with Competitive Scaling research
- Learning, Fast and Slow: Dual-Weight Architecture for Continual LLM Adaptation research
- QUBRIC: Co-Designing Queries and Rubrics Extends RLVR to Open-Ended Reasoning Domains research
- Quantifying Faithful Confidence Expression in Large Reasoning Models Yale NLP research
- SubtleMemory: Benchmark Reveals Agents Systematically Fail Fine-Grained Relational Memory research
- Code2LoRA: Hypernetwork Generates Repo-Specific Adapters for Code LMs with Zero Inference Overhead University of Waterloo research
- VideoKR: 315K-Example Training Corpus for Knowledge- and Reasoning-Intensive Video Understanding Yale University research
- Memory is Reconstructed, Not Retrieved: Graph Memory Improves LLM Agent Recall by 23% National University of Singapore research
- Diffusion-Proof: Formal Theorem Proving via Diffusion Language Models research
- DreamReasoner-8B: Block-Size Curriculum for Diffusion Reasoning Models research
- StylisticBias: 15 Visual Attributes Account for 80% of Social Bias in Multimodal LLMs research
- Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agent Loops research
- Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond HKUST/NUS/Oxford/NTU research
- World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Microsoft Research research
- LLM Safety From Within (SIREN) University of Toronto CSSLab / McGill / LMU Munich research