Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Some corrections: the vectors are un-rotated in practice for future query vectors. This could be removed with a slightly different LLM arch.

PolarQuant does live on in TurboQuant's codebooks for quantization which borrows from the hyperpolar coords

 help



> added in some gross / highly-pragmatic extra bits for important channels

I'm curious what you meant by that. I understood it to only have the MSE quantization vector, a 1-bit QJL vector, and a scalar magnitude.

> PolarQuant does live on in TurboQuant's codebooks for quantization which borrows from the hyperpolar coords

Isn't the turbo codebook the irregularly spaced centroid grid?


> extra bits per channel

Page 18 of the paper: > As shown in Table 1, our approach outperforms other methods for both Llama-3.1-8B-Instruct and Ministral-7B-Instruct, achieving significantly higher average scores. We evaluate our method using 2.5-bit and 3.5-bit quantization during text generation. These non-integer bit precisions result from our strategy of splitting channels into outlier and non-outlier sets, and applying two independent instances of TurboQuant to each, allocating higher bit precision to outliers. This outlier treatment strategy is consistent with prior work [63, 51] . For example, in our 2.5-bit setup, 32 outlier channels are quantized at 3 bits, while the remaining 96 channels use 2 bits, leading to an effective bit precision of (32 ×3 + 96×2)/128 = 2.5. For 3.5-bit quantization, a different ratio of outliers and regular channels leads to a higher effective bit precision. Despite using fewer bits than competing techniques, TurboQuant maintains performance comparable to unquantized models

So they find channels / indicies-of-the-vector that are important and give them more bits (3 bits) than the rest (2 bits).

>Isn't the turbo codebook the irregularly spaced centroid grid?

yes I believe so. They mention it's informed by the concentration of measure and the uncorrelated/independent vectors after the initial conditioning rotation. I feel like it was informed by PolarQuant, but that may just be how I intuit what's going on (because thinking about this in polar coordinates makes more sense in my head). IOW, I think the irregular spacing is maybe informed by TurboQuant.

However they do say, slightly to the contrary: "We find optimal scalar quantizers for random variables with Beta distributions by solving a continuous 1-dimensional k-means problem using the Max-Lloyd algorithm."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: