quantization — AI Digest

8 июн Google DeepMind Releases Gemma 4 QAT Checkpoints: Sub-1 GB On-Device E2B Model Google DeepMind models-llm
19 мая LongLive-2.0: NVFP4 Parallel Infrastructure for Long Video Generation (NVIDIA, 1,220 HF upvotes) NVIDIA research
12 июн llama.cpp b9603: Qualcomm Adreno OpenCL Kernels for On-Device Inference ggml-org tools