There is virtually no reason to use Ollama over LM Studio or the myriad of other...

alifeinbinary · 2026-04-03T10:58:05 1775213885

I really like LM Studio when I can use it under Windows but for people like me with Intel Macs + AMD gpu ollama is the only option because it can leverage the gpu using MoltenVK aka Vulkan, unofficially. We're still testing it, hoping to get the Vulkan support in the main branch soon. It works perfectly for single GPUs but some edge cases when using multiple GPUs are unsupported until upstream support from MoltenVK comes through. But yeah, I agree, it wasn't cool to repackage Georgi's work like that.

logicallee · 2026-04-03T13:40:54 1775223654

>Ollama is slower

I've benchmarked this on an actual Mac Mini M4 with 24 GB of RAM, and averaged 24.4 t/s on Ollama and 19.45 t/s on LM Studio for the same ~10 GB model (gemma4:e4b), a difference which was repeated across three runs and with both models warmed up beforehand. Unless there is an error in my methodology, which is easy to repeat[1], it means Ollama is a full 25% faster. That's an enormous difference. Try it for yourself before making such claims.

[1] script at: https://pastebin.com/EwcRqLUm but it warms up both and keeps them in memory, so you'll want to close almost all other applications first. Install both ollama and LM Studio and download the models, change the path to where you installed the model. Interestingly I had to go through 3 different AI's to write this script: ChatGPT (on which I'm a Pro subscriber) thought about doing so then returned nothing (shenanigans since I was benchmarking a competitor?), I had run out of my weekly session limit on Pro Max 20x credits on Claude (wonder why I need a local coding agent!) and then Google rose to the challenge and wrote the benchmark for me. I didn't try writing a benchmark like this locally, I'll try that next and report back.

dminik · 2026-04-03T13:51:34 1775224294

It depends on the hardware, backend and options. I've recently tried running some local AIs (Qwen3.5 9B for the numbers here) on an older AMD 8GB VRAM GPU (so vulkan) and found that:

llama.cpp is about 10% faster than LM studio with the same options.

LM studio is 3x faster than ollama with the same options (~13t/s vs ~38t/s), but messes up tool calls.

Ollama ended up slowest on the 9B, Queen3.5 35B and some random other 8B model.

Note that this isn't some rigorous study or performance benchmarking. I just found ollama unnaceptably slow and wanted to try out the other options.

gen6acd60af · 2026-04-03T11:32:54 1775215974

LM Studio is closed source.

And didn't Ollama independently ship a vision pipeline for some multimodal models months before llama.cpp supported it?

zozbot234 · 2026-04-03T14:20:53 1775226053

Yes, they introduced that Golang rewrite precisely to support the visual pipeline and other things that weren't in llama.cpp at the time. But then llama.cpp usually catches up and Ollama is just left stranded with something that's not fully competitive. Right now it seems to have messed up mmap support which stops it from properly streaming model weights from storage when doing inference on CPU with limited RAM, even as faster PCIe 5.0 SSDs are finally making this more practical.

The project is just a bit underwhelming overall, it would be way better if they just focused on polishing good UX and fine-tuning, starting from a reasonably up-to-date version of what llama.cpp provides already.

iLoveOncall · 2026-04-03T10:35:56 1775212556

> There is virtually no reason to use Ollama over LM Studio or the myriad of other alternatives.

Hmm, the fact that Ollama is open-source, can run in Docker, etc.?

DiabloD3 · 2026-04-03T13:02:27 1775221347

Ollama is quasi-open source.

In some places in the source code they claim sole ownership of the code, when it is highly derivative of that in llama.cpp (having started its life as a llama.cpp frontend). They keep it the same license, however, MIT.

There is no reason to use Ollama as an alternative to llama.cpp, just use the real thing instead.

simondotau · 2026-04-03T13:32:22 1775223142

If it’s MIT code derived from MIT code, in what way is its openness ”quasi”? Issues of attribution and crediting diminish the karma of the derived project, but I don’t see how it diminishes the level of openness.

DiabloD3 · 2026-04-04T00:43:34 1775263414

FOSS licensing can only exist in terms of Copyright. Without Copyright, you cannot license FOSS. If something has an incorrect Copyright attribution, then the license can be viewed as invalid until this deficiency has been corrected (obv. depending on local laws, etc).

On top of this, it would not be unreasonable for the numerous authors of llama.cpp to issue DMCA takedown requests if Ollama is unwilling to correct it.

jrm4 · 2026-04-03T14:57:39 1775228259

Do y'all mean backend or the Ollama frontend or both? I find it trivially easy to sub in my local Ollama api thing in virtually all of the interesting frontend things. I'm quite curious about the "why not Ollama" here.

faitswulff · 2026-04-03T11:56:40 1775217400

Does LM Studio have an equivalent to the ollama launch command? i.e. `ollama launch claude --model qwen3.5:35b-a3b-coding-nvfp4`

DiabloD3 · 2026-04-03T12:49:45 1775220585

I don't think it does, but llama.cpp does, and can load models off HuggingFace directly (so, not limited to ollama's unofficial model mirror like ollama is).

There is no reason to ever use ollama.

ffsm8 · 2026-04-03T13:04:00 1775221440

> I don't think it does, but llama.cpp does

I just checked their docs and can't see anything like it.

Did you mistake the command to just download and load the model?

DiabloD3 · 2026-04-04T00:38:23 1775263103

As a sibling comment answered you, it is `-hf`.

And yes, it downloads the model, caches it, and then serves future loads of that model out of the cache if the file hasn't changed in the hf repo.

ffsm8 · 2026-04-04T00:55:55 1775264155

So I'm summary: no, it does not have an equivalent command either.

u8080 · 2026-04-03T13:47:07 1775224027

-hf ModelName:Q4_K_M

ffsm8 · 2026-04-03T14:05:58 1775225158

Did you mistake the command to just download and load the model too?

Actually that shouldn't be a question, you clearly did.

Hint: it also opens Claude code configured to use that model

beanjuiceII · 2026-04-03T13:13:09 1775221989

sure there's a reason...it works fine thats the reason

meltyness · 2026-04-03T11:30:49 1775215849

I feel like the READMEs for these 3 large popular packages already illustrate tradeoffs better than hacker news argument

lousken · 2026-04-03T11:13:11 1775214791

lm studio is not opensource and you can't use it on the server and connect clients to it?

jedisct1 · 2026-04-03T11:32:34 1775215954

LM Studio can absolutely run as as server.

walthamstow · 2026-04-03T12:06:08 1775217968

IIRC it does so as default too. I have loads of stuff pointing at LM Studio on localhost