Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>At that time many young programmers regarded compression […] as very important.

They still do regard it as important, just in a different sense. Shannon formally proved that compression is congruent to statistical inference — an efficient compression algorithm seeks to minimize the statistical entropy of the scheme used to encode a message. Compression is thus loosely congruent to a branch of statistical inference you may have heard a thing or two about called "artificial intelligence" [0], which many young programmers are very much engaged with.

Take the Hutter Prize [1], which awards money for the ability to compress the entire English Wikipedia, or the Large Text Compression Benchmark [2], the latter of which's current top performer is a transformer-based neural network [3]. As the LTCP's rationale says [0],

>Given a probability distribution P over strings s, the Shannon capacity theorem states that the optimal (shortest average) code for s has length log 1/P(s) bits. If the distribution P is known, then such codes are in fact easy to find and compute. The problem is that for natural language, we do not know how to explicitly compute the model P(s) for a given string s.

>However humans must implicitly know P(s) because our speech and writing, by definition, follows this distribution. Now suppose that a compute can compute P(s) for all s. Then for any question q (or dialog ending in a question) and any answer a, the machine can compute P(a|q) = P(q,a)/P(q), and use this distribution to answer any question q posed by the judge. By definition, this distribution would be identical to the distribution of answers given by the human, and the two would be indistinguishable.

[0] http://www.mattmahoney.net/dc/rationale.html

[1] http://prize.hutter1.net/

[2] http://www.mattmahoney.net/dc/text.html

[3] https://bellard.org/nncp/



While compression is congruent with some cases of statistical inference, no one is trying to do statistical inference using compression. Even the example you gave of Fabrice Bellard (which I highly admire) does the converse and uses neural networks to compress.

Even for realizing what you've said one needs a bit of theory, basic computer science and some math. And most people today have a "do" philosophy and are result oriented. They are not learn oriented, they hate CS and math and they hate low level programming.

I don't think compression is highly relevant today, but at the time I started learning it was a good tool to sharpen one's mind. Learn algorithms, learn about memory management, learn about hardware, learn low level constructs, learn a bit of math.

Whenever a young person wanting to undertake a programmer career is asking me for advice I'll ask what their motivation is. If they just want money I advice them to learn the tools in fashion today, maybe Javascript and React, maybe Python, maybe C#. But if they respond they want to do it for both enjoyment and money, I put them on the hard path. I ask they enroll an BS CS program, or if that is not possible, give them long lists of books, tutorials, courses which cover CS basics, math used in CS basics, hardware, low level programming and the programming hot topics du jour. In that last case I also design their learning paths to accommodate their goals and I am always taking time to answer them questions in the future if I can, if not I will at least indicate where should they look or which is the right person to ask.


> no one is trying to do statistical inference using compression

Well, maybe no one sensible? I still think it's quite cool that you can take any general purpose compression algorithm, and abuse it to do e.g. image classification. (Just append the image you want to classify to each of the classes' sets in turn, and see which compresses best!)

And actually I do remember a paper that tried to use ideas from PAQ to do online learning. Gated Linear Networks, out of DeepMind, in 2019:

https://arxiv.org/abs/1910.01526

All compression is related to AI, but especially sample efficient online learning is basically what data compression is all about.


Well stochastic processes are a much wider category than mere algebraic operations applied on group theory that losseles data compression use.

I'd think that calculus or functional theory or category theory can find more bijections towards statistics or even congruences than mere arithmetics or algebra ever will. Ok, you can explain or derive any mathematics construct using only algebra, and there were efforts to do so, but does it makes sense?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: