Handbook · Chapter 2 of 12 · 12 min read

Anatomy of the image repo

A bootc distro is, at its core, one git repo that produces one OCI image. For Margine that repo is margine-image. This chapter walks its layout, the Containerfile, the staged build scripts, and the build-time write rules that bootc/ostree impose.

2.1 Lineage: ublue-os/image-template

Margine descends from the Universal Blue image-template pattern (github.com/ublue-os/image-template), the same skeleton behind Bluefin, Bazzite and Aurora customizations. The contract is minimal:

  • a Containerfile whose FROM is an existing bootc base image;
  • a build_files/ directory holding everything needed during the build but not wanted inside the final image;
  • a single RUN invocation (or a few) that bind-mounts build_files/ and runs a build.sh;
  • a final lint that proves the result is still a valid bootc container;
  • CI that builds, signs and pushes to a registry on every commit.

Margine credits this explicitly (/var/home/daniel/dev/margine-image/README.md):

- Bluefin — base image and source of most of what Margine ships.
- Universal Blue — image-template, CI patterns, `uupd`.
- Origami Linux — reference for the MOK-signing kernel script.
- hhd-dev/rechunk — ostree rechunking action.

Repo top level:

margine-image/
├── Containerfile            # the whole OS definition
├── build_files/             # build-time scripts + system_files overlay
├── installer/               # Anaconda installer-image context (Flatpak BAKE)
├── disk_config/             # bootc-image-builder TOML (qcow2, anaconda-iso)
├── live-env/                # Titanoboa live-ISO layer
├── docs/                    # repo-local postmortems and plans
└── .github/workflows/       # build, disk, smoke-boot, ISO publish

installer/, disk_config/ and live-env/ are consumed by later chapters; everything that defines the booted OS lives in Containerfile + build_files/.

2.2 The Containerfile, stage by stage

The ctx scratch stage: build inputs that never ship

# /var/home/daniel/dev/margine-image/Containerfile
# ----- Build context: scripts that should NOT end up in the final image -----
FROM scratch AS ctx
COPY build_files /
# Make installer/flatpaks-base reachable from build.sh at
# /ctx/installer-flatpaks-base. Single source of truth for the BAKE
# Flatpak list (audit §3.5: drop the duplicate here-doc in build.sh).
COPY installer/flatpaks-base    /installer-flatpaks-base

Practical effect: scripts live in a throwaway scratch stage and reach the real build only through an ephemeral --mount=type=bind. Nothing in build_files/ can leak into a shipped layer, and editing a script does not invalidate the base layer cache. The extra COPY installer/flatpaks-base makes one file the single source of truth for both the OCI image and the Anaconda installer (chapter on ISOs).

FROM bluefin-dx and pinning

# /var/home/daniel/dev/margine-image/Containerfile
# ----- Base: Bluefin DX (Fedora 44 track, "stable" tag) -----
FROM ghcr.io/ublue-os/bluefin-dx:stable

Margine pins to the floating :stable tag, not a digest. Trade-off: every weekly rebuild silently absorbs whatever Bluefin shipped (good: free maintenance of GNOME, drivers, dev tooling; bad: an upstream regression lands without a diff to review). The mitigations are downstream: a CI asset validator and a QEMU smoke-boot gate must pass before anything is promoted to Margine's own :stable (chapter on CI). The stricter alternative — digest pinning with Renovate/Dependabot bump PRs — is what several uBlue community images do; it buys reviewability at the cost of merge churn.

RUN --mount anatomy

Each build stage uses the same mount set:

# /var/home/daniel/dev/margine-image/Containerfile
RUN --mount=type=bind,from=ctx,source=/,target=/ctx \
    --mount=type=cache,dst=/var/cache \
    --mount=type=cache,dst=/var/log \
    --mount=type=tmpfs,dst=/tmp \
    --mount=type=secret,id=mok-key,target=/tmp/certs/MOK.key \
    --mount=type=secret,id=mok-cert,target=/tmp/certs/MOK.pem \
    /ctx/custom-kernel/install.sh
  • type=bind,from=ctx — scripts visible at /ctx, gone after the RUN.
  • type=cache on /var/cache and /var/log — dnf metadata and logs persist across builds but never enter a layer. This doubles as a guard: anything written there cannot ship, which is exactly what ostree wants for /var (see §2.5).
  • type=tmpfs on /tmp — scratch space, guaranteed empty in the image.
  • type=secret — the MOK private key, certificate and enrollment password exist only for the duration of this one RUN. No COPY of key material, no credentials in layer history.

The four RUN stages

  1. /ctx/custom-kernel/install.sh — swap the Fedora kernel for CachyOS from COPR, sign vmlinuz + every module with the MOK secrets, rebuild the initramfs (chapter 3).
  2. /ctx/build.sh — the orchestrator over all numbered NN-*/install.sh stages (§2.3).
  3. /ctx/build-margine-extensions.sh — bake GNOME Shell extensions system-wide into /usr/share/gnome-shell/extensions/. The Containerfile comment records why this is a separate stage: it replaces a racy per-user first-login installer, copying the Bluefin/Bazzite practice of build-time system-wide extensions.
  4. bootc container lint — final validation (§2.6).

Stage granularity matters for iteration speed: a change to a GNOME default re-runs stages 2-4 but reuses the cached (expensive, COPR-fetching, module-signing) kernel layer.

2.3 The build orchestrator and numbered stages

build.sh is deliberately boring — a 1416-line monolith was decomposed into per-area scripts (documented in /var/home/daniel/dev/margine-image/docs/build-sh-decomposition.md):

# /var/home/daniel/dev/margine-image/build_files/build.sh
set -euo pipefail
. /ctx/00-common.sh

log "==== Margine build orchestrator: starting ===="

# Run every sub-script in lexicographic order. Globs expand
# deterministically because we name dirs <NN>-<area>.
for d in /ctx/[1-9][0-9]-*/install.sh; do
  log "==> running $d"
  bash "$d"
done

Practical effect: adding a build concern = adding a directory. Ordering is encoded in the name, the glob is deterministic, and set -euo pipefail plus bash "$d" (not source) means one failing stage kills the build without leaking state into the next.

Shared state lives in one sourced file:

# /var/home/daniel/dev/margine-image/build_files/00-common.sh
log() { printf '[margine-build] %s\n' "$*"; }
# retry_curl <url> <out>        — 5 attempts, 30-150s backoff (COPR/raw.githubusercontent brownouts)
# retry_curl_strict <url> <out> — same, but aborts the build on missing/empty asset

export FEDORA_VER="${FEDORA_VER:-$(rpm -E %fedora 2>/dev/null || echo 44)}"
export BUILD_DATE="${BUILD_DATE:-$(date -u +%Y%m%d)}"
export MARGINE_REPO="${MARGINE_REPO:-https://raw.githubusercontent.com/daniel-g-carrasco/margine-fedora-atomic}"
export MARGINE_REF="${MARGINE_REF:-main}"

retry_curl_strict exists because a silently-failed asset download shipped user-visible regressions twice (missing welcome logo, missing About-panel logo); for assets the image is broken without, fail-loud beats a quiet placeholder.

The stages:

Dir Concern
10-os-identity/ os-release rewrite, /etc/passwd+/etc/group factory seed, system_files/ overlay copy
20-flatpaks/ BAKE list → /usr/share/margine/, DEFER list → /usr/share/flatpak/preinstall.d/
30-gnome-defaults/ zz1-margine.gschema.override (10 enabled extensions, favorites, accent), dconf keyfiles in /etc/dconf/db/distro.d/
40-spec-scripts/ fetch configure-*/validate-* helpers + declarations.yaml from the spec repo into /usr/bin
45-wsf/ build wayland-scroll-factor, install LD_PRELOAD drop-in for org.gnome.Shell@.service
50-branding/ logo, wallpaper, Plymouth theme, offline docs, GDM background, strip Bluefin branding
60-ujust-services/ 60-custom.just recipes, mask systemd-remount-fs, skel defaults

The boot-time passwd re-seed unit, staleness/upgrade notifiers, and first-boot autostarts no longer have a build stage of their own: their payloads ship as tracked files under build_files/system_files/ (libexec scripts + systemd units), copied wholesale into the rootfs by stage 10-os-identity — the system_files overlay this chapter already describes.

One detail in 60-ujust-services generalizes to any Bluefin derivative: the recipe file must be named 60-custom.just.

# /var/home/daniel/dev/margine-image/build_files/60-ujust-services/install.sh
# Bluefin's /usr/share/ublue-os/just/00-entry.just hardcodes the list
# of imported recipe files. The ONLY one declared as optional is
# 60-custom.just (via `import?`) — that's the documented extension
# point for downstream distros. Files dropped under any other name
# (e.g. 99-margine.just) are simply ignored by `ujust --list`.
install -Dm0644 /ctx/60-custom.just /usr/share/ublue-os/just/60-custom.just

2.4 The system_files/ overlay

Static files (units, libexec scripts, tuned profiles, icons, autostart entries) do not get heredoc'd in scripts — they live under build_files/system_files/ in a tree that mirrors their final path, and stage 10 overlays the whole thing onto /:

# /var/home/daniel/dev/margine-image/build_files/10-os-identity/install.sh
# The whole tree gets rsync'd into the rootfs at "/" so file paths in
# the repo mirror their final installed location. Same pattern as
# Bluefin's system_files/shared/.
if [[ -d /ctx/system_files ]]; then
  log "Copying /ctx/system_files/ → / (overlaying base rootfs)"
  cp -a /ctx/system_files/. /
  # Set executable bit on libexec scripts (cp -a preserves mode but
  # git may have flagged them differently across platforms).
  find /usr/libexec /usr/bin -type f \( \
      -path '*/margine-*' -o \
      -path '/usr/libexec/margine/*' \
    \) -exec chmod 0755 {} \;
fi

Practical effect: git log build_files/system_files/usr/lib/systemd/system/margine-docs-refresh.service is the change history of that exact file on disk. The current tree ships almost exclusively into /usr (units in /usr/lib/systemd/system/, scripts in /usr/libexec/margine/, tuned profiles in /usr/lib/tuned/profiles/), plus one /etc/xdg/autostart entry — consistent with the write rules below.

Stage 10 also rewrites OS identity. The non-obvious part is which fields a derivative may change:

# /var/home/daniel/dev/margine-image/build_files/10-os-identity/install.sh
NAME="Margine"
ID=fedora            # bootc-image-builder fails "could not find def file for
ID_LIKE=bluefin      # distro margine-44" if ID=margine; BIB does NOT fall
VARIANT_ID=margine   # back to ID_LIKE. Discriminate on VARIANT_ID instead.
...
printf '%s\n' "$OS_RELEASE_CONTENT" > /usr/lib/os-release
ln -sf ../usr/lib/os-release /etc/os-release   # canonical Fedora layout

NAME/PRETTY_NAME/VARIANT* are the branding surface; ID is an ecosystem contract (tooling does exact ID-VERSION_ID lookups). Fedora's own spins (Silverblue, Kinoite) follow the identical ID=fedora + distinct VARIANT_ID pattern.

Lesson — os-release symlink vs switch-root. Symptom: first VM boots failed with Failed to switch root: ... os-release file is missing, despite the file existing in the deployment. Root cause: two stacked issues. The initramfs lacked the ostree dracut module (so /sysroot was never pivoted to the deployment view), and the image pushed by plain buildah was not ostree-canonical, so composefs was not mounted over /usr when systemd's switch-root check did openat(fd, "etc/os-release", O_NOFOLLOW) — the /etc/os-release → ../usr/lib/os-release symlink dangled. Fix: short-term, ship os-release as a regular file in both places ("Fix A"); proper fix ("Fix B"), add dracut --add ostree in the kernel stage and wire hhd-dev/rechunk into CI so the published image is re-committed in ostree-canonical form — after which the canonical symlink was restored (the ln -sf above). Full writeups: margine-fedora-atomic/docs/lessons-learned/2026-05-28-initramfs-and-bootc-labels.md and .../2026-06-03-rechunk-and-fixb.md.

2.5 What may write where at build time

The rule set every script in this repo obeys:

  • /usr — yes. The immutable payload. Binaries, units, schemas, extensions, kernels (/usr/lib/modules/<kver>/vmlinuz + initramfs.img), even the passwd factory (/usr/lib/passwd).
  • /etc — yes, but it becomes the factory. At commit/rechunk time /etc content is captured as /usr/etc; on each deployment ostree 3-way-merges it with the machine's live /etc. Writes here are defaults, not state.
  • /var — no. /var is machine-local and reset/merged per deployment; content baked into it is dead weight at best and a lint error at worst. The Containerfile makes this structural: /var/cache and /var/log are cache mounts, so dnf can do its job without the result ever entering a layer.
  • /tmp — tmpfs mount, guaranteed not to ship.
  • /opt, /usr/local — symlinks into /var on Fedora/ostree; same prohibition applies.

Some tooling assumes a writable, persistent /var and has to be tricked. akmods is the canonical offender:

# /var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh
# akmodsbuild on bootc images skips signing if /var isn't writable; patch
# it out so akmods proceeds inside the container build.
disable_akmodsbuild() {
  _ak="/usr/sbin/akmodsbuild"
  cp -p "$_ak" "$_ak.backup"
  sed '/if \[\[ -w \/var \]\] ; then/,/fi/d' "$_ak" > "$_ak.tmp"
  mv "$_ak.tmp" "$_ak"
  chmod +x "$_ak"
}

The patch is reverted (restore_akmodsbuild) before the layer is committed — temporary mutations of /usr must be cleaned up by the same script that made them.

A second class of "build-time write" bug: transient dnf installs. The extensions stage refuses them entirely after an autoremove/Requires:-cascade incident:

# /var/home/daniel/dev/margine-image/build_files/build-margine-extensions.sh
# NO transient dnf installs. Lesson learned the hard way 2026-06-04:
#   dnf5 -y remove jq    # STILL broke things: scx-tools-git declares
#                        # Requires: jq → removal cascades through
#                        # scx-tools-git → scx-scheds → 16 packages.
# Robust fix: don't add or remove dnf packages here at all. Use
# Python stdlib (always present) for JSON parsing + zip extraction.

Lesson — rechunk strips the /etc factory. Symptom: after rebasing a Bluefin machine to Margine, boot spews dozens of Failed to resolve group 'audio'/'kvm'/'tty'; TPM unlock and audio break. Root cause: Bluefin ships a near-empty /etc/passwd (sysusers populates it at boot). The build-time seed (stage 10) fills it, and CI confirmed 65 entries post-build — but rechunk's re-commit stripped /etc/passwd//etc/group from the /usr/etc factory, so ostree's 3-way merge on the rebased machine kept only root plus the human user. Fix: belt and suspenders — keep the build-time seed and ship an idempotent boot-time oneshot that re-merges from /usr/lib/{passwd,group} whenever /etc/passwd drops below 20 entries:

# build_files/system_files/usr/lib/systemd/system/margine-seed-etc-passwd.service
[Unit]
DefaultDependencies=no
# DO NOT add After=local-fs.target: it creates an ordering cycle through
# systemd-tmpfiles-setup-dev.service → /dev/disk/by-uuid never populated
# → boot times out into emergency mode (incident 2026-06-01).
After=local-fs-pre.target
Before=systemd-sysusers.service systemd-tmpfiles-setup.service sysinit.target

The comment is its own sub-lesson: the first version of this unit ordered itself After=local-fs.target and systemd resolved the resulting dependency cycle by disabling systemd-tmpfiles-setup-dev — pushing every boot into emergency.target (.../lessons-learned/2026-06-01-systemd-ordering-cycle-and-rechunk-storage.md).

2.6 Commit and lint

The image must end as something bootc can deploy. Margine's Containerfile finishes with:

# /var/home/daniel/dev/margine-image/Containerfile
# ----- Lint: verify final image is a valid bootc container -----
RUN bootc container lint

bootc container lint checks the invariants this chapter described: no content baked into /var, valid kernel/initramfs layout under /usr/lib/modules/, sane /etc and composefs-compatible structure. It fails the build, so a violating commit never reaches the registry.

Two related mechanisms in the same family:

  • ostree container commit — the older uBlue/image-template idiom, appended to each RUN to clean /var and verify the layer (RUN /ctx/build.sh && ostree container commit). bootc-era templates replace it with the final bootc container lint; Margine never carried the old form.
  • rechunk (hhd-dev/rechunk, in CI, post-build) — re-commits the OCI image as an ostree-canonical tree with size-balanced layers. For Margine it is not just a bandwidth optimization: it is what made composefs come up early enough for the os-release symlink (Lesson above). The trade-off — it rewrites /usr/etc aggressively — produced the passwd-stripping Lesson.

Alternatives & other distros

Repo/build skeleton

  • ublue-os/image-template (Bluefin/Bazzite/Aurora customs, Margine): Containerfile + build_files/ + GitHub Actions; lowest-friction entry.
  • BlueBuild: declarative recipe.yml compiled to a Containerfile; less bash, less control over stage ordering.
  • Fedora rpm-ostree treefiles (Silverblue/Kinoite proper): YAML/JSON compose on Fedora infra; not container-native, no RUN step.
  • NixOS: full system from a Nix expression; maximal reproducibility, entirely different ecosystem, no OCI base reuse.
  • Vanilla OS (Vib + ABRoot): modular YAML recipe → OCI image, A/B partition deployment instead of ostree.
  • openSUSE MicroOS/Aeon: built with KIWI on OBS; btrfs-snapshot atomicity (transactional-update), not image-based delivery.

Base pinning

  • Floating tag (bluefin-dx:stable — Margine, most uBlue customs): zero maintenance, regressions absorbed silently; compensate with CI gates.
  • Digest pin + Renovate bumps: reviewable upstream diffs, constant PR churn.
  • Build-from-source base (Bazzite, Bluefin themselves build from ublue-os/main/Fedora base): full control, full maintenance burden.

Script staging

  • Numbered NN-*/install.sh dirs (Margine) ≈ Bluefin's build_files/shared/*.sh: deterministic, diff-friendly.
  • Single build.sh (stock image-template): fine until ~300 lines.
  • One RUN per concern in the Containerfile (Bazzite, dozens of layers): better layer caching per concern, registry layer-count bloat — exactly why rechunk exists.

Config overlay

  • system_files/ mirror-tree copied to / (Margine, Bluefin, Bazzite): file paths == repo paths.
  • Heredocs in scripts (Margine uses these for generated files only): content next to logic, but unreviewable past a screenful.
  • Nix modules / Vib modules: typed config instead of file trees; ecosystem lock-in.

Validation

  • bootc container lint (Margine, current uBlue): in-build, blocking.
  • ostree container commit (legacy uBlue): per-layer cleanup + check.
  • External smoke boot in QEMU before tag promotion (Margine's smoke-boot.yml, Bazzite's CI): catches what static lint cannot — the passwd and switch-root Lessons above were both runtime-only failures.