At some point your operations have to get atomistic. A mult requires O(n^2) transistors for n bits, so having each machine word carry its own mult unit seems silly. I suppose one could have a single bit with an xor(add), and(carry/mult).... But isn't that just an FPGA?
oh! very cool. Thanks! I knew about toom-cook, and based on what I do I figured it could be faster by breaking up into 'bigger power of 2 digits'. Now I know for sure (without having to implement it myself).
Realistically though, you're not gonna implement a FFT mult on a 64-bit integer/float even.
Pedantry is not annoying if it's a learning opportunity (or if it's excellently funny), IMO.