Ежедневный дайджест
11 пунктов · ~11 мин · Неделя 2026-W23
Tag warnings (new tags, lenient mode — add to vocabulary): knowledge-workers, image-to-video, gpt-rosalind, biodefense, biosecurity, pandemic-preparedness, life-sciences, humanoid, motion-tracking, zero-shot, reward-modeling, calibration, ssm. Dropped (already in 2026-06-02): Anthropic IPO S-1, Anthropic Project Glasswing expansion, Qwen3.7-Plus, MiniMax M3, OpenAI+Codex on AWS Bedrock, Microsoft Build MAI models. Dropped (single unique source — both Reuters articles via Investing.com share same syndicated origin): DeepSeek $7.4B Series A fundraising.
Обязательно к прочтению (1)
Trump Signs AI Executive Order Requiring 30-Day Voluntary Pre-Release Government Review
President Trump signed an executive order on June 2, 2026 directing AI companies to voluntarily submit frontier models for government security testing up to 30 days before public release. The order instructs federal agencies to develop AI cybersecurity benchmarks, establish an 'AI cybersecurity clearinghouse,' and strengthen government defenses against AI-enabled threats. An earlier draft mandated a 90-day window, cut to 30 days after industry pushback over innovation concerns.
Стоит знать (4)
OpenAI Launches Rosalind Biodefense Program with GPT-Rosalind for Pandemic Preparedness
OpenAIOpenAI announced Rosalind Biodefense on June 1, 2026 — a gated-access program offering GPT-Rosalind, a specialized life-sciences model, to vetted developers building biosecurity and pandemic preparedness applications. Initial partners include Johns Hopkins Applied Physics Laboratory and CEPI's 100 Days Mission for vaccine development acceleration. The program covers epidemiological modeling, early detection, screening, and non-pharmaceutical interventions; federal agencies with public-health and biodefense missions also receive extended access.
Humanoid-GPT: Scaling to 2B Motion Frames Enables Zero-Shot Generalization in Humanoid Control
Humanoid-GPT (arXiv 2606.03985, CVPR 2026) trains a GPT-style causal Transformer on a 2-billion-frame motion corpus aggregating seven datasets for whole-body humanoid control. Scaling both data and model capacity yields a single generative model that tracks highly dynamic motions while achieving zero-shot generalization to unseen tasks — dissolving the agility-generalization tradeoff inherent to prior MLP-based trackers. Inference latency is under 1.5ms on an RTX 4090. The paper also introduces Harmonic Motion Embedding (HME) to quantify motion diversity.
OpenAI Expands Codex Beyond Developers: Sites, Annotations, and Six Role-Specific Business Plugins
OpenAIOpenAI announced on June 2, 2026 a major expansion of Codex targeting non-developer knowledge workers. New features include Sites (creates interactive hosted web apps and dashboards from analysis), Annotations (inline collaborative editing without rebuilding projects), and six new role-specific plugins covering sales, data analytics, creative production, product design, public equity investing, and investment banking — aggregating 62 business apps including Salesforce, Figma, and Snowflake. Non-developers now account for ~20% of Codex's 5 million weekly users and are adopting at 3x the rate of engineers.
MiniMax Launches Hailuo 2.3 Video Model and Expands Video Agent into Media Agent
MiniMaxMiniMax released Hailuo 2.3 on June 3, 2026 with improvements in physical action portrayal, character micro-expressions, stylization, and motion command following. A new Hailuo 2.3 Fast variant reduces batch creation costs by up to 50% at the same price as Hailuo 02. Simultaneously, MiniMax renamed and expanded the Hailuo Video Agent into the Media Agent — a multi-modal creation platform now live globally on the Hailuo AI website, mobile app, and Open Platform API, with VEED as a day-one integration partner.
Справочно (6)
TrOPD: Trust-Region On-Policy Distillation Stabilizes LLM Training When Teacher-Student Gap Is Large
Samsung ResearchTrOPD (arXiv 2606.01249, submitted May 31, 2026) addresses instability in on-policy distillation when teacher and student distributions diverge substantially — a common failure mode when distilling strong reasoning models into smaller students. The method combines trust-region-bounded training restricted to regions of reliable teacher supervision, clipping and masking for outlier handling, and off-policy forward-KL guidance to encourage exploration toward trustworthy areas. It consistently outperforms OPD, EOPD, and REOPOLD baselines on mathematical reasoning, code generation, and general benchmarks.
Do Language Models Need Sleep? Offline Recurrence as Memory Consolidation for Improved Inference
Google / CMUThis Google/CMU paper (arXiv 2605.26099) proposes a sleep-like memory consolidation mechanism for language models. Periodically, the model converts recent context into persistent fast weights in SSM blocks through N offline recurrent passes, then clears its KV cache. On synthetic tasks (cellular automata, multi-hop graph retrieval) and math reasoning benchmarks, increasing sleep duration N improves performance, with the largest gains on examples requiring deeper multi-step reasoning.
QUBRIC: Co-Designing Queries and Rubrics Extends RLVR to Open-Ended Reasoning Domains
QUBRIC (arXiv 2606.03968) addresses a structural weakness in rubric-based RLVR: open-ended queries produce vague rubrics, but narrowing queries introduces fabricated references. The method jointly refines queries and rubrics — using teacher-derived key points to convert open-ended questions into scenario-specific ones, generating contrastive rubrics based on observed policy gaps, and filtering for informative training pairs. Results show a 5.5-point improvement on ArenaHard over SFT baselines, with 6.3-point average gains on legal, moral, and narrative reasoning.
Quantifying Faithful Confidence Expression in Large Reasoning Models
Yale NLPThis Yale NLP paper (arXiv 2606.03969) investigates whether large reasoning models faithfully express their actual uncertainty. The authors compare linguistic confidence signals against three internal uncertainty measures: token probabilities, hidden states, and response sampling consistency. Key findings: (1) reasoning capability does not automatically improve calibration; (2) standard prompting techniques do not transfer to reasoning models; (3) different internal uncertainty measures yield conflicting results, revealing fragility in existing evaluation methodologies.
Claude Code v2.1.161: OTEL Labels, Parallel Tool Call Resilience, Linux Clipboard Overhaul
AnthropicClaude Code v2.1.161 (released June 2, 2026) adds OTEL_RESOURCE_ATTRIBUTES values as metric labels for slicing usage by team and repo dimensions, improves the `claude agents` display to show done/total counts during fan-out, and collapses unused MCP claude.ai connectors by default. Key reliability fix: failed Bash commands in a parallel tool batch no longer cancel other in-flight calls. Linux fullscreen clipboard now uses wl-copy/xclip/xsel and supports both clipboard and PRIMARY selection. Additional bug fixes address managed-settings policy interference with third-party providers and background subagent stdout corruption.
ChatGPT Adds Live Job Search and Resume Formatting
OpenAIOpenAI updated ChatGPT on June 1, 2026 to surface live job listings and freelance opportunities from Indeed, Upwork, Appstack, and web search results. Users can upload, create, and download resumes in professional formats tailored to specific job descriptions. Job search is available on Free, Go, Plus, and Pro plans in the US; resume formatting is available on all plans globally in English on web.