Handbook · Chapter 3 of 12 · 14 min read

Replacing the kernel in an atomic image

The kernel is just files in the image: /usr/lib/modules/<kver>/vmlinuz, the module tree next to it, and an initramfs.img in the same directory. In a bootc image build you can remove the stock kernel and install another one with plain dnf inside the Containerfile — no bootloader scripting, no per-machine kernel-install dance. What makes it hard is everything around the files: the kernel-install hooks that assume a running system, out-of-tree modules that must be built against the new headers, an initramfs that must be regenerated for hardware the build container cannot see, and a handful of ostree-specific invariants (output path, dracut ostree module, the ostree.linux OCI label) that fail only at first boot.

Margine replaces Bluefin DX's stock kernel with kernel-cachyos from the bieszczaders/kernel-cachyos COPR. The whole swap lives in one script, invoked as the first RUN stage so every later stage (Plymouth, extensions) already sees the final kernel:

RUN --mount=type=bind,from=ctx,source=/,target=/ctx \
    --mount=type=cache,dst=/var/cache \
    --mount=type=cache,dst=/var/log \
    --mount=type=tmpfs,dst=/tmp \
    --mount=type=secret,id=mok-key,target=/tmp/certs/MOK.key \
    --mount=type=secret,id=mok-cert,target=/tmp/certs/MOK.pem \
    /ctx/custom-kernel/install.sh

/var/home/daniel/dev/margine-image/Containerfile (lines 39-46)

The BuildKit type=secret mounts stage the MOK signing material as ephemeral files — they exist during this RUN only and never land in a layer. Signing itself (sbsign on vmlinuz, sign-file on every .ko, the first-boot mok-enroll.service) is the Secure Boot chapter's subject; this chapter covers the swap, the module builds, and the initramfs.

3.1 Why a custom kernel at all

The decision is written down in ADR 0006 (/home/daniel/dev/margine-fedora-atomic/docs/adr/0006-kernel-cachyos-decision.md). Three options were on the table for a Fedora-Atomic-derived desktop in 2026:

A — kernel-cachyos (chosen) B — OGC kernel (Bazzite et al.) C — Bluefin's stock kernel
BORE scheduler builtin (CONFIG_SCHED_BORE=y) opt-in upstream, not in default config no
ThinLTO build yes (~3-5% win) no no
CONFIG_HZ 1000 300 300
Handheld HID / NTSYNC / gyro not the focus yes, in tree no
Maintainer surface single Fedora packager 8-distro shared CI Bluefin/uBlue team
Build pipeline cost ~420 LOC + MOK secrets in CI adopt akmods OCI pull zero — inherit from base

Margine is creator-first (real-time audio: Reaper, EasyEffects on PipeWire), so BORE + HZ=1000 + ThinLTO win over OGC's handheld patch set; for a gaming/handheld distro the matrix flips. Option C is the correct answer if you don't have a measured reason to deviate — it deletes this entire chapter from your build. The accepted risk is the single-maintainer COPR, mitigated with a re-review trigger ("no new COPR build for >30 days while kernel releases are in flight") watched by scripts/check-upstreams.sh.

The choice is also pinned in the spec, including the fallback story:

kernel:
  shipped:
    provider: cachyos-copr
    repo: bieszczaders/kernel-cachyos
    package: kernel-cachyos
    signed_with: margine-mok
    secure_boot_compliant: true
    installed_at: image-build-time
  fallback:
    provider: fedora
    available_via: rpm-ostree rollback (previous deployment)

/home/daniel/dev/margine-fedora-atomic/declarations/margine-atomic.yaml (lines 197-207)

Rollback to the previous deployment is the kernel safety net — atomic model means a bad kernel never strands the machine (chapter 1).

3.2 The swap, step by step

3.2.1 Neutralize kernel-install hooks

Installing a kernel RPM triggers /usr/lib/kernel/install.d/ hooks. Two of them misbehave in a container: the rpm-ostree hook assumes a live ostree deployment, and the dracut hook would generate a host-only initramfs for the build container (wrong by construction, §3.4). Margine stubs them out for the duration of the swap:

disable_kernel_install_hooks() {
  for _f in \
      /usr/lib/kernel/install.d/05-rpmostree.install \
      /usr/lib/kernel/install.d/50-dracut.install
  do
    [[ -f "$_f" ]] || continue
    mv "$_f" "$_f.bak"
    printf '#!/bin/sh\nexit 0\n' >"$_f"
    chmod +x "$_f"
  done
}

/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 63-73)

Each hook becomes exit 0; restore_kernel_install_hooks puts the originals back after the install so the shipped image is unmodified. Practical effect: the kernel RPM lays down files and nothing else — initramfs generation is done explicitly, once, at the end.

3.2.2 Remove the stock kernel

dnf -y remove \
    kernel kernel-core kernel-modules kernel-modules-core kernel-modules-extra \
    kernel-devel kernel-devel-matched || true
rm -rf /usr/lib/modules/* || true

/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 174-177)

The rm -rf /usr/lib/modules/* matters: bootc and several build scripts assume exactly one kernel under /usr/lib/modules/ (Margine's live-ISO build asserts this twice). Leftover module dirs from a half-removed stock kernel would produce two boot entries, a doubled initramfs loop, and an ambiguous ostree.linux label.

3.2.3 Install from COPR — with a retry loop

dnf -y copr enable "$COPR_REPO"
...
attempt=1
max_attempts=5
while :; do
  if dnf -y install --refresh $KERNEL_PACKAGES akmods; then
    log "CachyOS kernel install OK on attempt $attempt"
    break
  fi
  if (( attempt >= max_attempts )); then
    log "CachyOS kernel install FAILED after $max_attempts attempts (COPR likely down)"
    exit 1
  fi
  backoff=$(( attempt * 30 ))
  sleep $backoff
  dnf -y clean metadata || true
  attempt=$(( attempt + 1 ))
done

/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 180, 198-214, trimmed)

$KERNEL_PACKAGES is kernel-cachyos kernel-cachyos-core kernel-cachyos-modules kernel-cachyos-devel-matched. Two non-obvious choices:

  • -devel-matched, not -devel. The devel-matched virtual guarantees headers for exactly the installed kernel version. Plain -devel can resolve to a newer headers build if the COPR has published one between mirror syncs — and then every out-of-tree module compiles against headers the running kernel doesn't have.
  • The outer retry loop exists because COPR is a free service that browns out. A real build (run #26838562527, 2026-06-02) died with Curl error (28): Timeout was reached after librepo's five internal retries were already exhausted. Linear backoff (30/60/90/120s) plus dnf clean metadata per attempt rides out multi-minute COPR 5xx windows instead of sinking a ~28-minute image build.

Lesson — persistent build caches poison dnf Symptom: two consecutive builds on the self-hosted runner failed identically: package kernel-cachyos-modules-7.0.8... does not verify: Payload SHA256 ALT digest: BAD — same expected/actual hashes on retry, so not a flaky download. Root cause: the Containerfile mounts --mount=type=cache,dst=/var/cache. On GitHub-hosted runners that cache is born fresh per job; on the self-hosted runner it persists across builds, so one partial RPM in /var/cache/libdnf5/ gets re-used by every subsequent dnf install, forever. Fix (before the kernel install):

dnf -y clean packages metadata
...
dnf -y install --refresh $KERNEL_PACKAGES akmods

install.sh lines 188, 201. Belt and suspenders — clean packages drops cached RPMs, --refresh drops cached metadata.

3.2.4 Capture the version, scrub the repo

KERNEL_VERSION="$(rpm -q "$KERNEL_PKG" --queryformat '%{VERSION}-%{RELEASE}.%{ARCH}')"
...
rm -f /etc/yum.repos.d/*copr*

/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 217, 224)

KERNEL_VERSION (e.g. 7.0.8-cachyos1.fc44.x86_64) drives everything downstream: signing paths, akmods --kernels, the dracut loop. The COPR .repo file is deleted from the final image — deployed machines must never pull kernel updates from the COPR directly; kernel updates arrive only as new images through the CI pipeline. This "enable repo, install, scrub repo" pattern repeats for every third-party repo in the script (kernel-cachyos-addons, RPM Fusion).

3.3 Out-of-tree modules: the akmods pattern in a container

akmods is Fedora's mechanism for rebuilding out-of-tree kernel modules (akmod-* source packages → kmod-* binary RPMs) whenever a new kernel lands. On a normal system it runs as a boot-time service. In an image build you run it once, by hand, against the kernel you just installed — and you fight two container-specific problems.

Problem 1: akmodsbuild wants a writable /var. On bootc builds /var is a cache mount; akmodsbuild has a guard that silently skips work when /var isn't writable the way it expects. Margine patches the guard out of the script for the duration of the build:

disable_akmodsbuild() {
  _ak="/usr/sbin/akmodsbuild"
  [[ -f "$_ak" ]] || return 1
  cp -p "$_ak" "$_ak.backup"
  sed '/if \[\[ -w \/var \]\] ; then/,/fi/d' "$_ak" > "$_ak.tmp"
  mv "$_ak.tmp" "$_ak"
  chmod +x "$_ak"
}

/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 85-92)

sed deletes the whole if [[ -w /var ]] block from /usr/sbin/akmodsbuild; the backup is restored afterwards. Ugly, effective, and bounded to the build.

Problem 2: akmods always exits 0. Success must be detected by the absence of a failure log, not by exit code:

if dnf -y install \
      "https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm" \
   && dnf install -y --setopt=install_weak_deps=False --setopt=tsflags=noscripts \
      akmod-v4l2loopback; then
  if akmods --force --verbose --kernels "$KERNEL_VERSION" --kmod v4l2loopback; then
    # akmods always returns 0; check for *.failed.log explicitly
    V4L2_FAILED=0
    for _f in /var/cache/akmods/v4l2loopback/*-for-"$KERNEL_VERSION".failed.log; do
      [[ -f "$_f" ]] && V4L2_FAILED=1 && break
    done

/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 240-249)

Notes on the flags: tsflags=noscripts skips the akmod RPM's %post scriptlet (which would try to kick off a build via the boot-time service path); --kernels "$KERNEL_VERSION" builds against the CachyOS headers from -devel-matched, not whatever uname -r says inside the container (the runner's kernel — always wrong).

If the build succeeded, the produced binary kmod-* RPM is installed from the akmods cache:

_kmod_rpm="$(find /var/cache/akmods/v4l2loopback/ -name "kmod-v4l2loopback-*$KERNEL_VERSION*.rpm" -print -quit 2>/dev/null || true)"
if [[ -n "${_kmod_rpm:-}" && -f "$_kmod_rpm" ]]; then
  dnf -y install "$_kmod_rpm"
  TRANSIENT="$TRANSIENT kmod-v4l2loopback"
fi

/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 263-267)

The .ko lands under /usr/lib/modules/$KERNEL_VERSION/ and gets MOK-signed later in the same script along with every other module — out-of-tree modules need the same signature as in-tree ones under Secure Boot.

The whole v4l2loopback block is deliberately best-effort: a failed virtual-camera module is logged and skipped, never a failed image (v4l2loopback is the documented exception to the project's "no unjustified || true" rule). RPM Fusion is enabled only for this block and scrubbed immediately after (dnf -y remove rpmfusion-free-release; rm -f /etc/yum.repos.d/rpmfusion-free*.repo).

Cleanup of build-only packages is explicit, never autoremove:

log "Removing transient build-only packages: $TRANSIENT"
dnf -y remove $TRANSIENT || true

/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 396-398)

TRANSIENT = akmods sbsigntools kernel-cachyos-devel-matched (+ the akmod/kmod pair when built — the kmod's files survive; only the RPM bookkeeping is dropped to avoid a dangling package whose repo no longer exists). The comment at lines 374-377 records why dnf autoremove is banned here: with the COPR already disabled, autoremove decided the freshly installed kernel-cachyos chain itself was removable (margine-image PR #26).

Same-COPR userland: scx-scheds

The CachyOS kernel ships CONFIG_SCHED_CLASS_EXT=y, so Margine also installs the sched_ext BPF schedulers (scx_lavd, scx_bpfland, scx_rusty, …) from the sibling COPR bieszczaders/kernel-cachyos-addons — same maintainer, kernel and schedulers released as a pair, no version drift. Same retry loop, same repo scrub, and the daemon is opt-in:

log "Disabling scx_loader.service by default (opt-in via margine-scheduler)"
systemctl disable scx_loader.service 2>/dev/null || true

/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 316-317)

Bazzite pattern: ship the capability in the image, leave the service off (it cost battery with no obvious default win); users enable it via ujust margine-scheduler or a GUI picker.

3.4 Regenerating the initramfs in-container

This is where every naive kernel swap dies. Dracut's defaults are calibrated for "regenerate on the machine that will boot this" — and the build container is not that machine. Three defaults are wrong, and each one independently produces an unbootable image. Margine hit all three in production (lessons-learned, 2026-05-28 VM smoke test), plus two ostree-specific failures on top.

Lesson — --kver + --regenerate-all are mutually exclusive, and || true ate the proof Symptom: kernel panic at first boot: VFS: Cannot open root device "UUID=..."; available partitions listed as raw vda{1,2,3} — no LUKS mapper, no btrfs. Root cause: the build called dracut --force --kver "$KVER" --regenerate-all || true. dracut printed --regenerate-all cannot be called with a kernel version, exited 1 — and || true swallowed it. No initramfs was ever generated by our code; boot used a stale base-layer fallback built for Bluefin's kernel. Fix: drop the conflicting flag combination, and drop the || true — if dracut fails the image is unbootable, so fail loud. Meta-rule adopted project-wide: every || true in an image build needs a written justification.

Lesson — host-only initramfs of the wrong "host" Symptom: same panic, after the first fix. Root cause: dracut defaults to host-only mode, and "the host" is the CI build container: no LUKS device, no btrfs root, no virtioblk. dracut correctly omitted exactly the modules every real install needs. Fix: force generic mode on the command line _and persist the policy so any later regeneration (user-triggered rpm-ostree initramfs, the Plymouth stage's regen) inherits it:

mkdir -p /etc/dracut.conf.d
cat > /etc/dracut.conf.d/01-margine-no-hostonly.conf <<'CONF'
# Required for bootc / OCI image builds: the build environment is not
# the deployment environment, so initramfs must be generic.
hostonly="no"
hostonly_cmdline="no"
CONF

install.sh lines 409-415.

Lesson — dracut writes to /boot/; ostree reads /usr/lib/modules/<kver>/ Symptom: same panic. The published image did contain a correct 303 MB generic initramfs — at /boot/initramfs-7.0.8-cachyos1.fc44.x86_64.img. Root cause: bootc/ostree picks the initramfs from /usr/lib/modules/<kver>/initramfs.img at deploy time and ignores /boot/ (dracut's traditional default output). With nothing at the canonical path, ostree falls back to deploy-time auto-generation — host-only again. Fix: pass the output path as dracut's positional argument (see the final loop below). Verified against the Bluefin DX base image, which keeps its initramfs at exactly that path.

Lesson — the ostree dracut module is never auto-included Symptom: with all three fixes in, boot got past the initramfs and dropped to a dracut emergency shell: Failed to switch root: os-release file is missing. /sysroot contained only home/ root/ var/ — raw btrfs subvolumes, not a deployment. Root cause: dracut does not include the ostree module just because the build host is ostree-based. Without it the initramfs lacks ostree-prepare-root, the tool that pivots /sysroot from the raw disk root to the deployment checkout before systemd's switch-root. Diagnosed with lsinitrd <initramfs> | grep ostree → zero lines on the published image. Fix: --add "ostree" on every dracut invocation. --no-hostonly alone is not sufficient.

The final, correct invocation — all four lessons folded in:

for kver_dir in /usr/lib/modules/*/; do
  kver=$(basename "$kver_dir")
  dracut --force --no-hostonly --no-hostonly-cmdline \
      --add "ostree" \
      --kver "$kver" \
      "${kver_dir}initramfs.img"
  log "Wrote ${kver_dir}initramfs.img ($(du -h ${kver_dir}initramfs.img | cut -f1))"
done

/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 460-467)

One initramfs per kernel directory (there is exactly one — §3.2.2), written to the bootc-canonical path, generic, with ostree support. Two peripheral details from the surrounding script: dracut runs after module signing so the modules copied into the initramfs are the signed ones, and mkdir -p /root && chmod 700 /root beforehand silences a spurious dracut-install: ERROR: installing '/root' from the ssh-client module probing for /root/.ssh/ in a sysroot where /root doesn't exist (cosmetic; the alternative omit_dracutmodules+=" ssh-client " would also drop dropbear-based remote LUKS unlock support).

Lesson — the inherited ostree.linux OCI label points at the old kernel Symptom: initramfs fully fixed, boot fails at initrd-switch-root.service; bootloader entries reference deployment hashes that don't exist on disk. Root cause: Bluefin DX labels its image ostree.linux=<bluefin-kernel-version>. The kernel swap replaced the files but inherited the label, and bootc/rpm-ostree consult ostree.linux at deploy time to pick the kernel version for the bootloader entry and to find /usr/lib/modules/<label>/. Pointed at a nonexistent kernel, deployment-dir hash and bootloader-entry hash diverge. Fix: rewrite the label after build from the image's actual content (buildah config --label ostree.linux=<kver>, reading <kver> from /usr/lib/modules/ inside the built image). In the current pipeline this is subsumed by the rechunk step (hhd-dev/rechunk@v1.2.4 in /var/home/daniel/dev/margine-image/.github/workflows/build.yml lines 448-464), which re-commits the image in ostree-canonical form; a CI invariant check still asserts label == installed kernel on every build. General rule: FROM inherits all of the base's OCI labels. Any label describing content you changed (kernel version, rechunk manifest) must be overwritten.

3.5 Validating the swap

Three layers, because "image builds green" ≠ "image boots":

  • Build-time (CI, blocks push): /usr/lib/modules/*/initramfs.img exists; initramfs size sanity (>50 MB — a host-only one is <30); lsinitrd | grep ostree-prepare-root non-empty; dm-crypt/dm-mod/btrfs/virtio_blk present as modules or builtins; ostree.linux label matches the installed kernel. Every one of these maps 1:1 to a bug above.
  • Boot-time (CI, gates :stable): the QEMU smoke-boot workflow boots the candidate qcow2 under OVMF+swtpm and requires Reached target Multi-User System before skopeo copy --preserve-digests promotes candidate → :stable (CI chapter).
  • On-machine (user-run): margine-validate-cachyos-kernel confirms the running kernel is actually the shipped one:
section "Kernel"
kernel=$(uname -a)
if printf '%s\n' "$kernel" | grep -Eiq 'cachy|cachyos'; then
  ok "running kernel appears to be CachyOS"
else
  fail "running kernel does not appear to be CachyOS"
fi

/home/daniel/dev/margine-fedora-atomic/scripts/validate-cachyos-kernel (lines 30-38)

It also warns if stock Fedora kernel-* RPMs are visible in the deployment, and flags common out-of-tree module packages (nvidia, zfs, vbox) as out-of-policy.

Alternatives & other distros

Approaches to "which kernel ships in the image", roughly by increasing maintenance cost:

  • Stock Fedora kernel, untouched — Bluefin, Aurora, Silverblue/Kinoite, Fedora CoreOS. Signed by Fedora, boots under Secure Boot with zero ceremony, zero pipeline cost. The default; deviate only with a measured reason (ADR 0006 option C).
  • Stock kernel + prebuilt akmods from ghcr.io/ublue-os/akmods — Bluefin DX, Aurora, uBlue NVIDIA variants. Universal Blue builds/signs kmods (nvidia, xone, v4l2loopback, …) in dedicated OCI images; consumers COPY --from=ghcr.io/ublue-os/akmods:main-<fedora> the RPMs in. No compiler in your build, modules signed with the uBlue key (whose MOK users enroll once). The cleanest pattern if the kernels/kmods you need are already published.
  • OGC kernel (ghcr.io/ublue-os/akmods:ogc-… flavor) — Bazzite (migrated off its own kernel-bazzite fork, archived 2026-05-01), Nobara, ChimeraOS, Playtron, PikaOS. Shared 8-distro CI, upstream-first charter, handheld HID + NTSYNC + gyro in tree. The gaming-consensus kernel; no BORE/ThinLTO/HZ=1000 by default.
  • Surface kernels — Bluefin's -surface images swap in the linux-surface kernel the same remove-and-replace way, for Microsoft Surface hardware support. Demonstrates the pattern generalizes to any hardware-enablement tree.
  • COPR kernel installed in your own build — Margine (this chapter), Origami Linux (whose custom-kernel.sh Margine's script descends from). Maximum flexibility, you own signing, retries, initramfs, and the single-maintainer-COPR risk.
  • Runtime layering on the deployed machinerpm-ostree override remove kernel{,-core,-modules,...} --install kernel-cachyos. Margine's pre-image lab path (/home/daniel/dev/margine-fedora-atomic/docs/03-cachyos-kernel.md); works per-machine, rollback-safe, but unsigned (Secure Boot off only), per-machine drift, and every deployment rebuilds the override. Kept as documentation, superseded by image-baking.
  • NixOSboot.kernelPackages = pkgs.linuxPackages_cachyos; declaratively; module packages rebuilt by Nix against the chosen kernel. Same outcome, entirely different toolchain.
  • openSUSE MicroOS/Aeon — stock SUSE kernel via transactional-update/snapper snapshots; custom kernels are plain zypper packages in a transaction. Rollback via btrfs snapshot instead of image swap.
  • Vanilla OS (ABRoot) — Debian-based A/B partitions; kernel changes go through ABRoot transactions on the inactive root.
  • UKI / sealed images (systemd-boot + unified kernel image + composefs fs-verity) — tracked by Margine in ADR 0007, not yet shipping anywhere mainstream on the Fedora desktop track. Would replace the vmlinuz+initramfs pair (and most of §3.4) with a single signed PE binary; the long-term direction for measured boot.