Documentation

AI & local LLMs

Margine ships a single image and adds AI on top via the `ujust margine-ai` recipe. It installs Alpaca — a graphical app for running large language models entirely on your own machine, with its own backend bundled inside the Flatpak. Nothing leaves your computer, nothing runs as a system service, and the whole layer removes cleanly. This page covers what gets installed, how to pick a model for your hardware, the honest truth about GPU acceleration, and the CLI option for power users.

Why a local LLM

A local LLM is a chatbot — like the cloud assistants you've used — that runs on your machine instead of someone's server. The trade is straightforward: you give up the very largest frontier models, and in exchange everything you type and everything it answers stays on the device. No account, no API key, no network round-trip, works on a plane. For drafting, summarising, brainstorming, rewriting, and coding help, a good 7B–8B model running locally is genuinely useful.

Margine keeps this opt-in and sandboxed. The base image stays lean; you add AI only if you want it, and it lives entirely in a Flatpak with nothing layered onto the host.

Install the AI layer

One command from a terminal (Ptyxis in the dock):

ujust margine-ai

The recipe prints what it's going to do before doing anything — you can read it and cancel. It installs Alpaca (com.jeffser.Alpaca) as a Flatpak, visible in Bazaar afterwards. That's the whole install: no reboot, no daemon to start, no system service to manage.

What this does not touch: nothing native. The ujust margine-ai recipe adds no rpm-ostree packages and no akmods — it's all Flatpak, fully sandboxed and removable, and your base image stays exactly as lean as it was. (GPU acceleration is wired Flatpak-side — a plugin or Vulkan, see GPU vs CPU below — not by layering anything on the host. The base already carries the AMD/Intel compute drivers anyway.)

Alpaca bundles its own Ollama backend inside the Flatpak. You don't install Ollama, you don't run a daemon, you don't open a port. Launch the app and it's ready.

First run — pick a model

Open Alpaca from Activities (press Super, type "alpaca"). On first launch there's no model yet — the app does nothing until you download one.

Open Alpaca's model manager (the model picker / "Manage Models" area).
Choose a model from the list — start with one of the recommendations below.
Let it download. Models are multi-gigabyte, so the first pull takes a few minutes.
Select it and start chatting.

Models are cached after the first download and reused on every later launch. You can keep several installed and switch between them per conversation.

Choosing a model — by use-case and hardware

Model names look like name:size, where the size (e.g. 8b = 8 billion parameters) is the rough lever for quality-versus-resources. Bigger is smarter but needs more memory and runs slower. These four are good starting points:

Model	Best for	Size on disk	Fits
`llama3.1:8b`	General-purpose chat, the all-rounder	~5 GB	~8 GB GPU
`qwen2.5-coder:7b`	Coding — completion, explaining, refactoring	~4 GB	~8 GB GPU
`mistral:7b`	Fast general chat	~4 GB	~8 GB GPU
`phi3.5:3.8b`	Small and light — CPU-only or low-RAM machines	~2 GB	runs almost anywhere

The practical sizing rules:

~8 GB of VRAM is a comfortable floor for a 7B–8B model. With that on a GPU, the models above run responsively.
No discrete GPU, or less memory? Start with phi3.5:3.8b — it's designed to be small and stays usable on CPU. On a machine with ~16 GB of system RAM it's a sensible first pick.
More headroom? Once a 7B model feels comfortable, you can try larger variants from Alpaca's list and judge the speed/quality trade yourself.

Don't overthink the first choice — download llama3.1:8b if you have a capable GPU, phi3.5:3.8b if you don't, and adjust once you've felt the speed on your own machine.

GPU vs CPU — the honest reality

This is the part worth reading before you judge performance.

Two facts people conflate — both matter:

The base image already ships GPU compute. Inherited from Bluefin DX, Margine carries AMD ROCm + the AMD OpenCL runtime and Mesa Vulkan (RADV for AMD, ANV for Intel). So you do not need to install a compute stack on the host — for the native/CLI path (RamaLama, below) the GPU is usable straight away.
But Alpaca is a Flatpak — it's sandboxed, so it does not reach into that host ROCm. GPU acceleration inside Alpaca comes from a sandbox-side source instead:

AMD (ROCm) — the com.jeffser.Alpaca.Plugins.AMD Flatpak extension puts ROCm inside the sandbox. ujust margine-ai offers to install it when it detects an AMD GPU; or do it yourself: flatpak install flathub com.jeffser.Alpaca.Plugins.AMD.
AMD + Intel (Vulkan, no plugin) — recent Ollama has a Vulkan backend that uses Margine's Mesa drivers directly. It's often the best path for integrated GPUs / APUs (e.g. AMD "Radeon 7xxM"), where ROCm support is patchy.
NVIDIA — Flatpak auto-pulls the matching GL driver and Ollama uses CUDA through it. (CUDA is proprietary and is not in the base image.)
No usable GPU — it runs on the CPU. Works fine, just slower.

Integrated AMD GPUs (APUs): ROCm may not recognise an iGPU like a gfx110x part until you set, in Alpaca → Preferences → environment, HSA_OVERRIDE_GFX_VERSION=11.0.0 — or just use the Vulkan backend, which usually works on APUs with no override. Check Alpaca's hardware list to see which one the GPU shows up under.

Quick gut-check on whether the GPU is being used: a 7B model that answers within a second or two of "thinking" is almost certainly on the GPU; one that types an answer out slowly, word by word, is on the CPU. If it's slower than you'd like, drop to a smaller model, install the AMD plugin, or try the Vulkan backend.

The CLI path — RamaLama

Prefer a terminal workflow, want to script generation, or want models packaged as OCI/container images? RamaLama is the container-based runner for exactly that. It's the scriptable counterpart to Alpaca's GUI.

Margine doesn't install RamaLama for you — keeping with the sandboxed, opt-in approach, the ujust margine-ai recipe prints instructions to set it up inside a container rather than layering it on the host. The path:

# Create and enter a Fedora distrobox (one-time)
distrobox create --name fedora --image registry.fedoraproject.org/fedora
distrobox enter fedora

# Inside the box: install RamaLama, then run a model
dnf install ramalama
ramalama run llama3.1:8b

RamaLama pulls and runs models in containers, so it keeps the runner isolated the same way Alpaca does — just from the command line. It's the right tool if you want a repeatable, scriptable, terminal-first local-LLM workflow; Alpaca is the right tool if you want to click an icon and chat.

Remove the AI layer

Sandboxed means cleanly removable. One command:

ujust margine-ai-remove

That uninstalls Alpaca. To also reclaim disk space, delete the downloaded models from inside Alpaca before removing it, or clear its Flatpak data afterwards. Because nothing was layered onto the host, there's nothing else to undo — your base image is untouched.

Troubleshooting & FAQ

"It's really slow." You're probably running on the CPU. Two fixes: drop to a smaller model (try phi3.5:3.8b), or enable GPU acceleration — on AMD install the com.jeffser.Alpaca.Plugins.AMD plugin (ujust margine-ai offers it) or use the Vulkan backend; on NVIDIA make sure the proprietary driver is active. Then relaunch. See GPU vs CPU above.
"My GPU isn't being used." Alpaca is a sandboxed Flatpak, so it does not use the host ROCm even though the base ships it — a working desktop/games does not imply the sandbox has compute. Give the sandbox a backend: the AMD ROCm plugin, Vulkan (best for integrated GPUs/APUs — add HSA_OVERRIDE_GFX_VERSION=11.0.0 in Alpaca's environment if ROCm ignores an APU), or the NVIDIA GL driver. Restart Alpaca after.
"Do I need to run Ollama or start a service?" No. Alpaca bundles its own Ollama backend inside the Flatpak — there's no host install, no daemon, and no system service to enable.
"Where do downloaded models go / how big are they?" They're cached inside Alpaca's Flatpak data and reused across launches. Each model is a few gigabytes (see the table). Manage and delete them from inside Alpaca.
"Is anything sent to the internet?" Only the initial model download. Once a model is on disk, chatting is fully local and offline — that's the entire point of this layer.
"Which model should I start with?" llama3.1:8b if you have ~8 GB of VRAM, phi3.5:3.8b if you're on CPU or low on memory, qwen2.5-coder:7b if your main use is coding.
"Can I use the terminal instead?" Yes — see The CLI path — RamaLama. The recipe prints the setup steps.