HeavySkill: Internalizing Heavy Thinking as a Trainable Agentic Skill via RL

Research official 2 src. ~1 min

HeavySkill reframes 'heavy thinking' in LLMs not as an external orchestration artifact but as a learnable, internalized skill consisting of two stages: parallel reasoning followed by summarization. The authors show via reinforcement learning that this skill can be deepened and broadened, with empirical results demonstrating consistent improvements over Best-of-N strategies.

Why it matters

Suggests that complex reasoning can be trained directly into model weights rather than scaffolded through external prompting frameworks, with implications for agent harness design.

Importance: 2/5

Research paper reframing heavy thinking as a trainable agentic skill via RL.

Sources