reasoning — AI Digest

17 июн VibeThinker-3B Reaches Frontier-Level Reasoning Benchmarks via Curriculum RL WeiboAI research
30 апр Recursive Multi-Agent Systems: agent communication in latent space Stanford University research
9 мая Zyphra Releases ZAYA1-8B: Open Reasoning MoE Model Trained on AMD Hardware Zyphra models-llm
10 мая Google DeepMind's AI Co-Mathematician Reaches 48% on FrontierMath Tier 4 Google DeepMind research
13 мая RubricEM: Meta-RL with Rubric-Guided Policy Decomposition Beyond Verifiable Rewards Google research
15 мая SU-01: Gold-Medal-Level Olympiad Reasoning via Curriculum SFT and Two-Stage RL SU-01 Team research
18 мая SOOHAK: Frontier LLMs Solve Hard Math But Fail to Recognize Unsolvable Problems research
20 мая Code as Agent Harness: Survey Positions Code as the Substrate for Executable Agent Systems (159 HF upvotes) Multi-institution (42 authors) research
20 мая SkillsVote: Lifecycle Governance of Agent Skills — Collection, Recommendation, Evolution (219 HF upvotes) Memtensor Research Group / IAAR-Shanghai research
18 июн Grok 4.3 Now Available on Amazon Bedrock with 1M-Token Context xAI models-llm
18 мая RoPE Provably Fails at Long Contexts: Locality Bias and Token Consistency Both Break research
10 июн DRPO: Rethinking Divergence Regularization in LLM Reinforcement Learning Tencent Hunyuan research
14 июн MaxProof: MiniMax Model Exceeds IMO and USAMO Gold-Medal Thresholds on Formal Math MiniMax research
6 мая Ctx2Skill: Self-Improving Framework for Autonomous Context-Skill Discovery in LLMs research
8 мая AI Co-Mathematician: Google DeepMind Achieves 48% on FrontierMath Tier 4 Google DeepMind research
16 мая SDAR: Self-Distilled Agentic Reinforcement Learning for Multi-Turn Agents Zhejiang University / Meituan research
19 мая MMSkills: Reusable Multimodal Skills for General Visual Agents (105 HF upvotes) Shanghai Jiao Tong University research
2 июн GrepSeek: Training Search Agents for Direct Corpus Interaction via Shell Commands (93 HF Upvotes) University of Massachusetts Amherst research
4 июн ThoughtFold: Introspective Preference Learning Cuts Reasoning Tokens by 56% Without Accuracy Loss research
6 июн The Deterministic Horizon: Information-Theoretic Proof That Extended CoT Fails and Tool Use Is Necessary research
6 июн The Self-Correction Illusion: LLMs Fix Others' Errors but Not Their Own — Role Labels Are the Cause research
8 июн GitHub Copilot Gets 1M Token Context Window and Configurable Reasoning Levels GitHub / Microsoft tools
8 июн Agentic Transformers Provably Learn Depth-First Search via Reinforcement Learning Carnegie Mellon University / Ohio State University research
11 июн Arbor: Generalist Autonomous ML Research via Hypothesis-Tree Refinement NLPIR Lab research
11 июн DeNovoSWE: Full Repository Generation Jumps from 5.8% to 47.2% with Synthetic Training Data AweAI Team research
11 июн Z-Reward: Score Distributions Instead of Scalar Rewards for Image Generation RLHF Alibaba research
2 мая ESamp: LLMs explore by latent distilling for semantic-novelty sampling ShanghaiTech University research
5 мая Odysseus: Training VLMs for 100+ Turn Interactive Decision-Making via RL Princeton University research
11 мая Soohak: 64 Mathematicians Build Research-Level Benchmark That Stumps Frontier LLMs Seoul National University research
11 мая AutoTTS: LLM Agents Automatically Discover Test-Time Scaling Strategies for $40 research
3 июн TrOPD: Trust-Region On-Policy Distillation Stabilizes LLM Training When Teacher-Student Gap Is Large Samsung Research research
3 июн Do Language Models Need Sleep? Offline Recurrence as Memory Consolidation for Improved Inference Google / CMU research
12 июн InterleaveThinker: RL Framework for Agentic Text-and-Image Interleaved Generation research
12 июн Astra: RL-Trained VLM Queries World Simulator for Spatial Reasoning research
6 мая HeavySkill: Internalizing Heavy Thinking as a Trainable Agentic Skill via RL research
7 мая LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents Shanghai Jiao Tong University research
7 мая Executable World Models for ARC-AGI-3: Coding-Agent Approach Without Game-Specific Logic research
12 мая NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized AI Research Automation Shanghai AI Lab research
12 мая TMAS: Scaling Test-Time Compute via Multi-Agent Synergy with Hierarchical Memory research
13 мая Learning, Fast and Slow: Dual-Weight Architecture for Continual LLM Adaptation research
18 мая BetaPRM: Uncertainty-Aware Process Rewards Cut Reasoning Token Use by 33% research
19 мая NudgeRL: Strategy-Level Context Nudges for Efficient RLVR Exploration KAIST AI research
3 июн QUBRIC: Co-Designing Queries and Rubrics Extends RLVR to Open-Ended Reasoning Domains research
3 июн Quantifying Faithful Confidence Expression in Large Reasoning Models Yale NLP research
8 июн SubtleMemory: Benchmark Reveals Agents Systematically Fail Fine-Grained Relational Memory research
8 июн VideoKR: 315K-Example Training Corpus for Knowledge- and Reasoning-Intensive Video Understanding Yale University research
9 июн Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight Rutgers University research
10 июн SearchSwarm: Delegation Intelligence for LLM Agents in Long-Horizon Deep Research research
16 июн Memory is Reconstructed, Not Retrieved: Graph Memory Improves LLM Agent Recall by 23% National University of Singapore research
17 июн ZPPO: Teacher-in-Prompts Knowledge Distillation Outperforms Gradient Methods for Small Reasoners NVIDIA research
18 июн Diffusion-Proof: Formal Theorem Proving via Diffusion Language Models research
18 июн DreamReasoner-8B: Block-Size Curriculum for Diffusion Reasoning Models research