Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sounds like Multi-Head Latent Attention (MLA) from DeepSeek
 help



Nah, those are completely different beasts. DeepSeek's MLA solves the KV cache issue via low-rank projection - they literally squeeze the matrix through a latent vector at train time. TurboQuant is just Post-Training Quantization where they mathematically compress existing weights and activations using polar coordinates

No, it is about compressing the KV cache; see How TurboQuant works.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: