#reasoning
- VibeThinker-3B Reaches Frontier-Level Reasoning Benchmarks via Curriculum RL WeiboAI research
- Recursive Multi-Agent Systems: agent communication in latent space Stanford University research
- Zyphra Releases ZAYA1-8B: Open Reasoning MoE Model Trained on AMD Hardware Zyphra models-llm
- Google DeepMind's AI Co-Mathematician Reaches 48% on FrontierMath Tier 4 Google DeepMind research
- RubricEM: Meta-RL with Rubric-Guided Policy Decomposition Beyond Verifiable Rewards Google research
- SU-01: Gold-Medal-Level Olympiad Reasoning via Curriculum SFT and Two-Stage RL SU-01 Team research
- SOOHAK: Frontier LLMs Solve Hard Math But Fail to Recognize Unsolvable Problems research
- Code as Agent Harness: Survey Positions Code as the Substrate for Executable Agent Systems (159 HF upvotes) Multi-institution (42 authors) research
- SkillsVote: Lifecycle Governance of Agent Skills — Collection, Recommendation, Evolution (219 HF upvotes) Memtensor Research Group / IAAR-Shanghai research
- Grok 4.3 Now Available on Amazon Bedrock with 1M-Token Context xAI models-llm
- RoPE Provably Fails at Long Contexts: Locality Bias and Token Consistency Both Break research
- DRPO: Rethinking Divergence Regularization in LLM Reinforcement Learning Tencent Hunyuan research
- MaxProof: MiniMax Model Exceeds IMO and USAMO Gold-Medal Thresholds on Formal Math MiniMax research
- Ctx2Skill: Self-Improving Framework for Autonomous Context-Skill Discovery in LLMs research
- AI Co-Mathematician: Google DeepMind Achieves 48% on FrontierMath Tier 4 Google DeepMind research
- SDAR: Self-Distilled Agentic Reinforcement Learning for Multi-Turn Agents Zhejiang University / Meituan research
- MMSkills: Reusable Multimodal Skills for General Visual Agents (105 HF upvotes) Shanghai Jiao Tong University research
- GrepSeek: Training Search Agents for Direct Corpus Interaction via Shell Commands (93 HF Upvotes) University of Massachusetts Amherst research
- ThoughtFold: Introspective Preference Learning Cuts Reasoning Tokens by 56% Without Accuracy Loss research
- The Deterministic Horizon: Information-Theoretic Proof That Extended CoT Fails and Tool Use Is Necessary research
- The Self-Correction Illusion: LLMs Fix Others' Errors but Not Their Own — Role Labels Are the Cause research
- GitHub Copilot Gets 1M Token Context Window and Configurable Reasoning Levels GitHub / Microsoft tools
- Agentic Transformers Provably Learn Depth-First Search via Reinforcement Learning Carnegie Mellon University / Ohio State University research
- Arbor: Generalist Autonomous ML Research via Hypothesis-Tree Refinement NLPIR Lab research
- DeNovoSWE: Full Repository Generation Jumps from 5.8% to 47.2% with Synthetic Training Data AweAI Team research
- Z-Reward: Score Distributions Instead of Scalar Rewards for Image Generation RLHF Alibaba research
- ESamp: LLMs explore by latent distilling for semantic-novelty sampling ShanghaiTech University research
- Odysseus: Training VLMs for 100+ Turn Interactive Decision-Making via RL Princeton University research
- Soohak: 64 Mathematicians Build Research-Level Benchmark That Stumps Frontier LLMs Seoul National University research
- AutoTTS: LLM Agents Automatically Discover Test-Time Scaling Strategies for $40 research
- TrOPD: Trust-Region On-Policy Distillation Stabilizes LLM Training When Teacher-Student Gap Is Large Samsung Research research
- Do Language Models Need Sleep? Offline Recurrence as Memory Consolidation for Improved Inference Google / CMU research
- InterleaveThinker: RL Framework for Agentic Text-and-Image Interleaved Generation research
- Astra: RL-Trained VLM Queries World Simulator for Spatial Reasoning research
- HeavySkill: Internalizing Heavy Thinking as a Trainable Agentic Skill via RL research
- LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents Shanghai Jiao Tong University research
- Executable World Models for ARC-AGI-3: Coding-Agent Approach Without Game-Specific Logic research
- NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized AI Research Automation Shanghai AI Lab research
- TMAS: Scaling Test-Time Compute via Multi-Agent Synergy with Hierarchical Memory research
- Learning, Fast and Slow: Dual-Weight Architecture for Continual LLM Adaptation research
- BetaPRM: Uncertainty-Aware Process Rewards Cut Reasoning Token Use by 33% research
- NudgeRL: Strategy-Level Context Nudges for Efficient RLVR Exploration KAIST AI research
- QUBRIC: Co-Designing Queries and Rubrics Extends RLVR to Open-Ended Reasoning Domains research
- Quantifying Faithful Confidence Expression in Large Reasoning Models Yale NLP research
- SubtleMemory: Benchmark Reveals Agents Systematically Fail Fine-Grained Relational Memory research
- VideoKR: 315K-Example Training Corpus for Knowledge- and Reasoning-Intensive Video Understanding Yale University research
- Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight Rutgers University research
- SearchSwarm: Delegation Intelligence for LLM Agents in Long-Horizon Deep Research research
- Memory is Reconstructed, Not Retrieved: Graph Memory Improves LLM Agent Recall by 23% National University of Singapore research
- ZPPO: Teacher-in-Prompts Knowledge Distillation Outperforms Gradient Methods for Small Reasoners NVIDIA research
- Diffusion-Proof: Formal Theorem Proving via Diffusion Language Models research
- DreamReasoner-8B: Block-Size Curriculum for Diffusion Reasoning Models research