>Ollama is slower I've benchmarked this on an actual Mac Mini M4 with 24 GB of R...

dminik · 2026-04-03T13:51:34 1775224294

It depends on the hardware, backend and options. I've recently tried running some local AIs (Qwen3.5 9B for the numbers here) on an older AMD 8GB VRAM GPU (so vulkan) and found that:

llama.cpp is about 10% faster than LM studio with the same options.

LM studio is 3x faster than ollama with the same options (~13t/s vs ~38t/s), but messes up tool calls.

Ollama ended up slowest on the 9B, Queen3.5 35B and some random other 8B model.

Note that this isn't some rigorous study or performance benchmarking. I just found ollama unnaceptably slow and wanted to try out the other options.