If I'm reading this correctly, the author is saying that it's not a failure of l...

xelxebar · on Nov 15, 2023

Did you see the end of the article, where the author uses a small example and gets "B is A" generalization?

These are the salient takeaways I got:

- Is/Was wording might matter. This is something probably a bug.

- 30 facts about a person might simply be too little for "B to A" generalization

- Extra precision/context in the prompt can help locate the "B to A" inference.

- How you cut up your training data can bias inferences in surprising ways.

- "B to A" generalization clearly does happen, even without "B is A" in the data, but it's not as stable as you'd want.

cratermoon · on Nov 15, 2023

OK, but how does that not just demonstrate LLMs can't generalize absent massaged training data?

> it's not as stable as you'd want.

Which I take to mean that a model can sometimes confabulate "B is A", solely out of random variation, and that it's possible to bias the data and prompt to generate the expected response. The model hasn't done any logical deduction, the response is just a bias-influenced lucky break.