I'm tired of LLM bullshitting. So I fixed it.

SuspciousCarrot78@lemmy.world · edit-2 20 hours ago

I'm tired of LLM bullshitting. So I fixed it.

rollin@piefed.social · 3 days ago

At first blush, this looks great to me. Are there limitations with what models it will work with? In particular, can you use this on a lightweight model that will run in 16 Gb RAM to prevent it hallucinating? I’ve experimented a little with running ollama as an NPC AI for Skyrim - I’d love to be able to ask random passers-by if they know where the nearest blacksmith is for instance. It was just far too unreliable, and worse it was always confidently unreliable.

This sounds like it could really help these kinds of uses. Sadly I’m away from home for a while so I don’t know when I’ll get a chance to get back on my home rig.

SuspciousCarrot78@lemmy.world · 3 days ago

My brother in virtual silicon: I run this shit on a $200 p.o.s with 4gb of VRAM.

If you can run an LLM at all, this will run. BONUS: because of the way “Vodka” operates, you can run with a smaller context window without eating shit of OOM errors. So…that means… if you could only run a 4B model (because the GGUF itself is 3GBs without the over-heads…then you add in the drag from the KV cache accumulation)… maybe you can now run next sized up model…or enjoy no slow down chats with the model size you have.

rollin@piefed.social · 3 days ago

I never knew LLMs can run on such low-spec machines now! That’s amazing. You said elsewhere you’re using Qwen3-4B (abliterated), and I found a page saying that there are Qwen3 models that will run on “Virtually any modern PC or Mac; integrated graphics are sufficient. Mobile phones”

Is there still a big advantage to using Nvidia GPUs? Is your card Nvidia?

My home machine that I’ve installed ollama on (and which I can’t access in the immediate future) has an AMD card, but I’m now toying with putting it on my laptop, which is very midrange and has Intel Arc graphics (which performs a whole lot better than I was expecting in games)

SuspciousCarrot78@lemmy.world · 3 days ago

Yep, LLMs can and do run on edge devices (weak hardware).

One of the driving forces for this project was in fact trying to make my $50 raspberry pi more capable of running llm. It sits powered on all the time, so why not?

No special magic with NVIDIA per se, other than ubiquity.

Yes, my card is NVIDIA, but you don’t need a card to run this.

I'm tired of LLM bullshitting. So I fixed it.

I'm tired of LLM bullshitting. So I fixed it.

llama-conductor

The thing: llama-conductor

1) KB mechanics that don’t suck (1990s engineering: markdown, JSON, checksums)

2) Mentats: proof-or-refusal mode (Vault-only)

3) Vodka: deterministic memory on a potato budget