Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Same. My husky/pyr mix needs a lot of exercise, so I'm outside a minimum of a few hours a day. As a result I do a lot of dictation on my phone.

I put together a script that takes any audio file (mp3, wav), normalizes it, runs it through ggerganov's whisper, and then cleans it up using a local LLM. This has saved me a tremendous amount of time. Even modestly sized 7b parameter models can handle syntactical/grammatical work relatively easily.

Here's the gist:

https://gist.github.com/scpedicini/455409fe7656d3cca8959c123...

EDIT: I've always talked out loud through problems anyway, throw a BT earbud on and you'll look slightly less deranged.



If it’s helpful here is the prompt I use to clean up voice-memo transcripts: https://gist.github.com/adamsmith/2a22b08d3d4a11fb9fe06531ae...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: