Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That’s more a problem of tokenisation. In many cases, an LLM doesn’t see individual letters at all, so it can’t encode them one at a time.


I am quite sure GPT can undo tokenization if needed. It has no problem handling misspellings, plus you can just ask it to "spell things" L-I-K-E S-O.


I'm pretty sure there is a spellchecker ran before the tokenisation

also it is often wrong with spelling stuff out like that, especially with rarer or more interesting tokens




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: