HeavySkill: Internalizing Heavy Thinking as a Trainable Agentic Skill via RL
HeavySkill reframes 'heavy thinking' in LLMs not as an external orchestration artifact but as a learnable, internalized skill consisting of two stages: parallel reasoning followed by summarization. The authors show via reinforcement learning that this skill can be deepened and broadened, with empirical results demonstrating consistent improvements over Best-of-N strategies.
Why it matters
Suggests that complex reasoning can be trained directly into model weights rather than scaffolded through external prompting frameworks, with implications for agent harness design.
Importance: 2/5
Research paper reframing heavy thinking as a trainable agentic skill via RL.