#architecture 3 items 11 мая Mean Mode Screaming: Training Pathology Fix Enables 1000-Layer Diffusion Transformers research 8 мая Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and a Fix research 9 мая Cola DLM: Continuous Latent Diffusion Language Model with Competitive Scaling research