Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From your second paper:

  > In particular, we can generate fixed random rotation matrices at initialization, and multiply them into the activations any time we read from or write to the residual stream. 
I guess I was mistaken in assuming this part was part of the TurboQuant-specific innovations. Still an interesting concept though


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: