DeNovoSWE: Full Repository Generation Jumps from 5.8% to 47.2% with Synthetic Training Data

AweAI Team

Research official 1 src. ~1 min

DeNovoSWE addresses a gap in AI code agents: most training data covers bug-fixing in existing codebases, not building complete repositories from scratch. The benchmark provides 4,818 instances where each requires generating a full repo from documentation. A divide-and-conquer critic-repair pipeline with difficulty-aware filtering produces high-quality training trajectories. Fine-tuning Qwen3-30B-A3B on this data pushes BeyondSWE-Doc2Repo performance from 5.8% to 47.2%.

Why it matters

21 upvotes on HuggingFace June 11. The near 10× benchmark jump demonstrates that training-data quality for long-horizon coding tasks is a major bottleneck — automated, sandboxed construction can close the gap. Advances AI toward being a full software architect rather than just a patch writer.

Importance: 3/5

Notable research paper; near 10× benchmark improvement on full-repo generation; new training data paradigm for long-horizon coding agents.

agents code-generation software-engineering reasoning

Sources

official arXiv:2606.10728 — DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch