Beyond NL2Code: A Structured Survey of Multimodal Code Intelligence
A comprehensive survey of code intelligence systems that go beyond natural-language-only inputs, covering how LLMs process visual artifacts — screenshots, charts, vector drawings, interactive UI states — to generate executable code. The paper maps four domains: graphical user interfaces, scientific visualization, structured graphics, and emerging agent frameworks, and argues future progress requires multi-signal validation and agent transparency.
Why it matters
Topped HuggingFace Daily Papers for June 25 with 262 upvotes — the highest-voted paper of the day. As AI coding assistants increasingly encounter visual specs and UI mockups, this survey frames the open challenges in visually-grounded programming and sets a research agenda for the next generation of coding agents.
Importance: 3/5
Top HF Daily paper (262 upvotes); directly relevant to next-gen coding agents processing visual inputs