#computer-use 1 item 14 июн WeaveBench: Computer-Use Agents Fail at Hybrid GUI+CLI Tasks — 41% Pass Rate Microsoft Research research