#quantization
- Google DeepMind Releases Gemma 4 QAT Checkpoints: Sub-1 GB On-Device E2B Model Google DeepMind models-llm
- LongLive-2.0: NVFP4 Parallel Infrastructure for Long Video Generation (NVIDIA, 1,220 HF upvotes) NVIDIA research
- llama.cpp b9603: Qualcomm Adreno OpenCL Kernels for On-Device Inference ggml-org tools