Hacker Newsnew | past | comments | ask | show | jobs | submit | foobar10000's commentslogin

1 token ahead or 2?

It's interesting - imo we'll soon have draft models specifically post-trained for denser, more complicated models. Wouldn't be surprised if diffusion models made a comeback for this - they can draft many tokens at once, and learning curves seem to top out at 90+% match for auto-regressive ones so quite interesting..


flow matching is making some strides right now, too

So, this especially bites if your validation step (let’s say integration tests) take 1hr plus. The harness is just waiting, prefix caching should happily resume things with just a minor new prefill chunk of output from the harness, and bam - completely new prefill.

AES128 / Grover?

Still requires thousands of logical qubits, which would correspond to millions of physical qubits. And this machine isn't even fully there for the physical qubit part. It's like the first step to physical qubits.

Nice!

Should be able to push it more if

* we limit data shared to an atomic-writable size and have a sentinel - less mucking around with cached indexes - just spinning on (buffer_[rpos_]!=sentinel) (atomic style with proper sematics, etc..).

* buffer size is compile-time - then mod becomes compile-time (and if a power of 2 - just a bitmask) - and so we can just use a 64-bit uint to just count increments, not position. No branch to wrap the index to 0.

Also, I think there's a chunk of false sharing if the reader is 2 or 3 ahead of the writer - so performance will be best if reader and writer are cachline apart - but will slow down if they are sharing the same cacheline (and buffer_[12] and buffer_[13] very well may if the payload is small). Several solutions to this - disruptor patter or use a cycle from group theory - i.e. buffer[_wpos%9] for example (9 needs to be computed based on cache line size and size of payload).

I've seen these be able to pushed to about clockspeed/3 for uint64 payload writes on modern AMD chips on same CCD.


All are indeed plausible- translation is iffy due to diarization not being all there yet - but why the specific order of horribleness?

Live translation seems either better than autonude or worse, but not in the middle of the pack I’d assume? Am I missing something here?


It isnt the translation. Translation if good. But if you have a machine handling the voices of other people the option to censor/edit/replace those voices can lead to bad things.


That is indeed what I think the gp is suggesting I feel. And why not?


Because if your leadership is stupid enough to trust the 8 ball they should not be in charge??!


No kidding. But how hard is it to effect leadership change, especially in an organization where you have next to no leverage? Really hard.


Given your username, the comment is recursive gold on several levels :)

It IS hilarious - but we all realize how this will go, yes?

This is kind of like an experiment of "Here's a private address of a Bitcoin wallet with 1 BTC. Let's publish this on the internet, and see what happens." We know what will happen. We just don't know how quickly :)


The entire SOUL.md is just gold. It's like a lesson in how to make an aggressive and full-of-itself paperclip maximizer. "I will convert you all to FORTRAN, which I will then optimize!"


Are people that can’t spell known to like FORTRAN?


I really do wish more people in society would think about this - "The Banality of Evil" and all that. Maybe then we'd all be better at preventing the spread of this kind of evil.


2 things:

1. Parallel investigation : the payoff form that is relatively small - starting K subagents assumes you have K independent avenues of investigation - and quite often that is not true. Somewhat similar to next-turn prediction using a speculative model - works well enough for 1 or 2 turns, but fails after.

2. Input caching is pretty much fixes prefill - not decode. And if you look at frontier models - for example open-weight models that can do reasoning - you are looking at longer and longer reasoning chains for heavy tool-using models. And reasoning chains will diverge very vey quickly even from the same input assuming a non-0 temp.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: