I first saw this implementation from a Harvard paper back when LLM's were still just a novelty[0]. Glad to see they got their demo site back up. Always thought it was a cool idea.
The justice system claims to be anti-axe murderer, yet axes were involved in the construction of nearly every courthouse in the nation! How can this be?
for anyone who wants to try a consumer grade stegongraphy in browser. I built some thing here. Its free and loads a static page with a wasm binary. Once the page loads everything is handled in the browser.
You provide a carrier file (currently .mp4, .pdf, .jpeg or .png ) and impregnate it with an entire encrypted file system with a full viewer and gallery mode. Also supports streaming, so you can actually encrypt a a full blueray movie and run range requests.
Does this actually embed the data using stenography, or does it just append it (encrypted) to an area of the file that the carrier format doesn't care about? The advantage of stenography is that it should be difficult for a middleman to know that there even is any data embedded.
I wonder if you can construct a function between the encoder and decoder such that for any given input, both the raw and manipulated embeddings decode to plausible meanings that are guaranteed to be different.
Ha! I've been thinking of this exact thing, and was curious how natural-looking the end result would be / how much you could compress the tokens by choosing less and less likely ones until it became obvious gibberish. I'm kinda surprised that it just sounds like normal slop at that density. Seems viable to use with "just" two bots chattering away at each other, and also occasionally sending meaningful packets.
In principle the output is arbitrarily natural-looking. The arithmetic coding procedure effectively turns your secret message into a stream of bits that is statistically indistinguishable from random, the same as you pull out of your PRNG in normal generation.
Yes, with a few gotchas, especially related to end handling. If the government extracts the hidden bits from possibly stego-streams, and half of the ones theyv encounter give an "unexpected end of input" error, but yours never give that error, they will know that your hidden bit streams likely contain some message.
You can avoid it by using a bijective arithmetic encoder, which by definition never encounters an "unexpected end of stream error", and any bit string decodes to a different message. That's the cool way.
The boring practical way is to just encrypt your bits.
Pro-tip from unfrozen caveman lawyer: "Your honor. My client want hide thing from t-rex lang mo-del. He have big brain. So he not put thing on Al Gore device with series of tubes. (Unlike many on modern-day BBS called Haxer News.) T-rex not eat what t-rex not find."
[0] https://github.com/harvardnlp/NeuralSteganography
reply