It's interesting - imo we'll soon have draft models specifically post-trained for denser, more complicated models. Wouldn't be surprised if diffusion models made a comeback for this - they can draft many tokens at once, and learning curves seem to top out at 90+% match for auto-regressive ones so quite interesting..
So, this especially bites if your validation step (let’s say integration tests) take 1hr plus. The harness is just waiting, prefix caching should happily resume things with just a minor new prefill chunk of output from the harness, and bam - completely new prefill.
Still requires thousands of logical qubits, which would correspond to millions of physical qubits. And this machine isn't even fully there for the physical qubit part. It's like the first step to physical qubits.
* we limit data shared to an atomic-writable size and have a sentinel - less mucking around with cached indexes - just spinning on (buffer_[rpos_]!=sentinel) (atomic style with proper sematics, etc..).
* buffer size is compile-time - then mod becomes compile-time (and if a power of 2 - just a bitmask) - and so we can just use a 64-bit uint to just count increments, not position. No branch to wrap the index to 0.
Also, I think there's a chunk of false sharing if the reader is 2 or 3 ahead of the writer - so performance will be best if reader and writer are cachline apart - but will slow down if they are sharing the same cacheline (and buffer_[12] and buffer_[13] very well may if the payload is small). Several solutions to this - disruptor patter or use a cycle from group theory - i.e. buffer[_wpos%9] for example (9 needs to be computed based on cache line size and size of payload).
I've seen these be able to pushed to about clockspeed/3 for uint64 payload writes on modern AMD chips on same CCD.
It isnt the translation. Translation if good. But if you have a machine handling the voices of other people the option to censor/edit/replace those voices can lead to bad things.
Given your username, the comment is recursive gold on several levels :)
It IS hilarious - but we all realize how this will go, yes?
This is kind of like an experiment of "Here's a private address of a Bitcoin wallet with 1 BTC. Let's publish this on the internet, and see what happens." We know what will happen. We just don't know how quickly :)
The entire SOUL.md is just gold. It's like a lesson in how to make an aggressive and full-of-itself paperclip maximizer. "I will convert you all to FORTRAN, which I will then optimize!"
I really do wish more people in society would think about this - "The Banality of Evil" and all that. Maybe then we'd all be better at preventing the spread of this kind of evil.
1. Parallel investigation : the payoff form that is relatively small - starting K subagents assumes you have K independent avenues of investigation - and quite often that is not true. Somewhat similar to next-turn prediction using a speculative model - works well enough for 1 or 2 turns, but fails after.
2. Input caching is pretty much fixes prefill - not decode. And if you look at frontier models - for example open-weight models that can do reasoning - you are looking at longer and longer reasoning chains for heavy tool-using models. And reasoning chains will diverge very vey quickly even from the same input assuming a non-0 temp.
It's interesting - imo we'll soon have draft models specifically post-trained for denser, more complicated models. Wouldn't be surprised if diffusion models made a comeback for this - they can draft many tokens at once, and learning curves seem to top out at 90+% match for auto-regressive ones so quite interesting..
reply