Handbook · Chapter 3 of 12 · 14 min read
Replacing the kernel in an atomic image
The kernel is just files in the image: /usr/lib/modules/<kver>/vmlinuz, the module tree next to it, and an initramfs.img in the same directory. In a bootc image build you can remove the stock kernel and install another one with plain dnf inside the Containerfile — no bootloader scripting, no per-machine kernel-install dance. What makes it hard is everything around the files: the kernel-install hooks that assume a running system, out-of-tree modules that must be built against the new headers, an initramfs that must be regenerated for hardware the build container cannot see, and a handful of ostree-specific invariants (output path, dracut ostree module, the ostree.linux OCI label) that fail only at first boot.
Margine replaces Bluefin DX's stock kernel with kernel-cachyos from the bieszczaders/kernel-cachyos COPR. The whole swap lives in one script, invoked as the first RUN stage so every later stage (Plymouth, extensions) already sees the final kernel:
RUN --mount=type=bind,from=ctx,source=/,target=/ctx \
--mount=type=cache,dst=/var/cache \
--mount=type=cache,dst=/var/log \
--mount=type=tmpfs,dst=/tmp \
--mount=type=secret,id=mok-key,target=/tmp/certs/MOK.key \
--mount=type=secret,id=mok-cert,target=/tmp/certs/MOK.pem \
/ctx/custom-kernel/install.sh
/var/home/daniel/dev/margine-image/Containerfile (lines 39-46)
The BuildKit type=secret mounts stage the MOK signing material as ephemeral files — they exist during this RUN only and never land in a layer. Signing itself (sbsign on vmlinuz, sign-file on every .ko, the first-boot mok-enroll.service) is the Secure Boot chapter's subject; this chapter covers the swap, the module builds, and the initramfs.
3.1 Why a custom kernel at all
The decision is written down in ADR 0006 (/home/daniel/dev/margine-fedora-atomic/docs/adr/0006-kernel-cachyos-decision.md). Three options were on the table for a Fedora-Atomic-derived desktop in 2026:
A — kernel-cachyos (chosen) |
B — OGC kernel (Bazzite et al.) | C — Bluefin's stock kernel | |
|---|---|---|---|
| BORE scheduler | builtin (CONFIG_SCHED_BORE=y) |
opt-in upstream, not in default config | no |
| ThinLTO build | yes (~3-5% win) | no | no |
CONFIG_HZ |
1000 | 300 | 300 |
| Handheld HID / NTSYNC / gyro | not the focus | yes, in tree | no |
| Maintainer surface | single Fedora packager | 8-distro shared CI | Bluefin/uBlue team |
| Build pipeline cost | ~420 LOC + MOK secrets in CI | adopt akmods OCI pull | zero — inherit from base |
Margine is creator-first (real-time audio: Reaper, EasyEffects on PipeWire), so BORE + HZ=1000 + ThinLTO win over OGC's handheld patch set; for a gaming/handheld distro the matrix flips. Option C is the correct answer if you don't have a measured reason to deviate — it deletes this entire chapter from your build. The accepted risk is the single-maintainer COPR, mitigated with a re-review trigger ("no new COPR build for >30 days while kernel releases are in flight") watched by scripts/check-upstreams.sh.
The choice is also pinned in the spec, including the fallback story:
kernel:
shipped:
provider: cachyos-copr
repo: bieszczaders/kernel-cachyos
package: kernel-cachyos
signed_with: margine-mok
secure_boot_compliant: true
installed_at: image-build-time
fallback:
provider: fedora
available_via: rpm-ostree rollback (previous deployment)
/home/daniel/dev/margine-fedora-atomic/declarations/margine-atomic.yaml (lines 197-207)
Rollback to the previous deployment is the kernel safety net — atomic model means a bad kernel never strands the machine (chapter 1).
3.2 The swap, step by step
3.2.1 Neutralize kernel-install hooks
Installing a kernel RPM triggers /usr/lib/kernel/install.d/ hooks. Two of them misbehave in a container: the rpm-ostree hook assumes a live ostree deployment, and the dracut hook would generate a host-only initramfs for the build container (wrong by construction, §3.4). Margine stubs them out for the duration of the swap:
disable_kernel_install_hooks() {
for _f in \
/usr/lib/kernel/install.d/05-rpmostree.install \
/usr/lib/kernel/install.d/50-dracut.install
do
[[ -f "$_f" ]] || continue
mv "$_f" "$_f.bak"
printf '#!/bin/sh\nexit 0\n' >"$_f"
chmod +x "$_f"
done
}
/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 63-73)
Each hook becomes exit 0; restore_kernel_install_hooks puts the originals back after the install so the shipped image is unmodified. Practical effect: the kernel RPM lays down files and nothing else — initramfs generation is done explicitly, once, at the end.
3.2.2 Remove the stock kernel
dnf -y remove \
kernel kernel-core kernel-modules kernel-modules-core kernel-modules-extra \
kernel-devel kernel-devel-matched || true
rm -rf /usr/lib/modules/* || true
/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 174-177)
The rm -rf /usr/lib/modules/* matters: bootc and several build scripts assume exactly one kernel under /usr/lib/modules/ (Margine's live-ISO build asserts this twice). Leftover module dirs from a half-removed stock kernel would produce two boot entries, a doubled initramfs loop, and an ambiguous ostree.linux label.
3.2.3 Install from COPR — with a retry loop
dnf -y copr enable "$COPR_REPO"
...
attempt=1
max_attempts=5
while :; do
if dnf -y install --refresh $KERNEL_PACKAGES akmods; then
log "CachyOS kernel install OK on attempt $attempt"
break
fi
if (( attempt >= max_attempts )); then
log "CachyOS kernel install FAILED after $max_attempts attempts (COPR likely down)"
exit 1
fi
backoff=$(( attempt * 30 ))
sleep $backoff
dnf -y clean metadata || true
attempt=$(( attempt + 1 ))
done
/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 180, 198-214, trimmed)
$KERNEL_PACKAGES is kernel-cachyos kernel-cachyos-core kernel-cachyos-modules kernel-cachyos-devel-matched. Two non-obvious choices:
-devel-matched, not-devel. Thedevel-matchedvirtual guarantees headers for exactly the installed kernel version. Plain-develcan resolve to a newer headers build if the COPR has published one between mirror syncs — and then every out-of-tree module compiles against headers the running kernel doesn't have.- The outer retry loop exists because COPR is a free service that browns out. A real build (run #26838562527, 2026-06-02) died with
Curl error (28): Timeout was reachedafter librepo's five internal retries were already exhausted. Linear backoff (30/60/90/120s) plusdnf clean metadataper attempt rides out multi-minute COPR 5xx windows instead of sinking a ~28-minute image build.
Lesson — persistent build caches poison dnf Symptom: two consecutive builds on the self-hosted runner failed identically:
package kernel-cachyos-modules-7.0.8... does not verify: Payload SHA256 ALT digest: BAD— same expected/actual hashes on retry, so not a flaky download. Root cause: the Containerfile mounts--mount=type=cache,dst=/var/cache. On GitHub-hosted runners that cache is born fresh per job; on the self-hosted runner it persists across builds, so one partial RPM in/var/cache/libdnf5/gets re-used by every subsequentdnf install, forever. Fix (before the kernel install):dnf -y clean packages metadata ... dnf -y install --refresh $KERNEL_PACKAGES akmods
install.shlines 188, 201. Belt and suspenders —clean packagesdrops cached RPMs,--refreshdrops cached metadata.
3.2.4 Capture the version, scrub the repo
KERNEL_VERSION="$(rpm -q "$KERNEL_PKG" --queryformat '%{VERSION}-%{RELEASE}.%{ARCH}')"
...
rm -f /etc/yum.repos.d/*copr*
/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 217, 224)
KERNEL_VERSION (e.g. 7.0.8-cachyos1.fc44.x86_64) drives everything downstream: signing paths, akmods --kernels, the dracut loop. The COPR .repo file is deleted from the final image — deployed machines must never pull kernel updates from the COPR directly; kernel updates arrive only as new images through the CI pipeline. This "enable repo, install, scrub repo" pattern repeats for every third-party repo in the script (kernel-cachyos-addons, RPM Fusion).
3.3 Out-of-tree modules: the akmods pattern in a container
akmods is Fedora's mechanism for rebuilding out-of-tree kernel modules (akmod-* source packages → kmod-* binary RPMs) whenever a new kernel lands. On a normal system it runs as a boot-time service. In an image build you run it once, by hand, against the kernel you just installed — and you fight two container-specific problems.
Problem 1: akmodsbuild wants a writable /var. On bootc builds /var is a cache mount; akmodsbuild has a guard that silently skips work when /var isn't writable the way it expects. Margine patches the guard out of the script for the duration of the build:
disable_akmodsbuild() {
_ak="/usr/sbin/akmodsbuild"
[[ -f "$_ak" ]] || return 1
cp -p "$_ak" "$_ak.backup"
sed '/if \[\[ -w \/var \]\] ; then/,/fi/d' "$_ak" > "$_ak.tmp"
mv "$_ak.tmp" "$_ak"
chmod +x "$_ak"
}
/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 85-92)
sed deletes the whole if [[ -w /var ]] block from /usr/sbin/akmodsbuild; the backup is restored afterwards. Ugly, effective, and bounded to the build.
Problem 2: akmods always exits 0. Success must be detected by the absence of a failure log, not by exit code:
if dnf -y install \
"https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm" \
&& dnf install -y --setopt=install_weak_deps=False --setopt=tsflags=noscripts \
akmod-v4l2loopback; then
if akmods --force --verbose --kernels "$KERNEL_VERSION" --kmod v4l2loopback; then
# akmods always returns 0; check for *.failed.log explicitly
V4L2_FAILED=0
for _f in /var/cache/akmods/v4l2loopback/*-for-"$KERNEL_VERSION".failed.log; do
[[ -f "$_f" ]] && V4L2_FAILED=1 && break
done
/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 240-249)
Notes on the flags: tsflags=noscripts skips the akmod RPM's %post scriptlet (which would try to kick off a build via the boot-time service path); --kernels "$KERNEL_VERSION" builds against the CachyOS headers from -devel-matched, not whatever uname -r says inside the container (the runner's kernel — always wrong).
If the build succeeded, the produced binary kmod-* RPM is installed from the akmods cache:
_kmod_rpm="$(find /var/cache/akmods/v4l2loopback/ -name "kmod-v4l2loopback-*$KERNEL_VERSION*.rpm" -print -quit 2>/dev/null || true)"
if [[ -n "${_kmod_rpm:-}" && -f "$_kmod_rpm" ]]; then
dnf -y install "$_kmod_rpm"
TRANSIENT="$TRANSIENT kmod-v4l2loopback"
fi
/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 263-267)
The .ko lands under /usr/lib/modules/$KERNEL_VERSION/ and gets MOK-signed later in the same script along with every other module — out-of-tree modules need the same signature as in-tree ones under Secure Boot.
The whole v4l2loopback block is deliberately best-effort: a failed virtual-camera module is logged and skipped, never a failed image (v4l2loopback is the documented exception to the project's "no unjustified || true" rule). RPM Fusion is enabled only for this block and scrubbed immediately after (dnf -y remove rpmfusion-free-release; rm -f /etc/yum.repos.d/rpmfusion-free*.repo).
Cleanup of build-only packages is explicit, never autoremove:
log "Removing transient build-only packages: $TRANSIENT"
dnf -y remove $TRANSIENT || true
/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 396-398)
TRANSIENT = akmods sbsigntools kernel-cachyos-devel-matched (+ the akmod/kmod pair when built — the kmod's files survive; only the RPM bookkeeping is dropped to avoid a dangling package whose repo no longer exists). The comment at lines 374-377 records why dnf autoremove is banned here: with the COPR already disabled, autoremove decided the freshly installed kernel-cachyos chain itself was removable (margine-image PR #26).
Same-COPR userland: scx-scheds
The CachyOS kernel ships CONFIG_SCHED_CLASS_EXT=y, so Margine also installs the sched_ext BPF schedulers (scx_lavd, scx_bpfland, scx_rusty, …) from the sibling COPR bieszczaders/kernel-cachyos-addons — same maintainer, kernel and schedulers released as a pair, no version drift. Same retry loop, same repo scrub, and the daemon is opt-in:
log "Disabling scx_loader.service by default (opt-in via margine-scheduler)"
systemctl disable scx_loader.service 2>/dev/null || true
/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 316-317)
Bazzite pattern: ship the capability in the image, leave the service off (it cost battery with no obvious default win); users enable it via ujust margine-scheduler or a GUI picker.
3.4 Regenerating the initramfs in-container
This is where every naive kernel swap dies. Dracut's defaults are calibrated for "regenerate on the machine that will boot this" — and the build container is not that machine. Three defaults are wrong, and each one independently produces an unbootable image. Margine hit all three in production (lessons-learned, 2026-05-28 VM smoke test), plus two ostree-specific failures on top.
Lesson —
--kver+--regenerate-allare mutually exclusive, and|| trueate the proof Symptom: kernel panic at first boot:VFS: Cannot open root device "UUID=..."; available partitions listed as rawvda{1,2,3}— no LUKS mapper, no btrfs. Root cause: the build calleddracut --force --kver "$KVER" --regenerate-all || true. dracut printed--regenerate-all cannot be called with a kernel version, exited 1 — and|| trueswallowed it. No initramfs was ever generated by our code; boot used a stale base-layer fallback built for Bluefin's kernel. Fix: drop the conflicting flag combination, and drop the|| true— if dracut fails the image is unbootable, so fail loud. Meta-rule adopted project-wide: every|| truein an image build needs a written justification.
Lesson — host-only initramfs of the wrong "host" Symptom: same panic, after the first fix. Root cause: dracut defaults to host-only mode, and "the host" is the CI build container: no LUKS device, no btrfs root, no virtioblk. dracut correctly omitted exactly the modules every real install needs. Fix: force generic mode on the command line _and persist the policy so any later regeneration (user-triggered
rpm-ostree initramfs, the Plymouth stage's regen) inherits it:mkdir -p /etc/dracut.conf.d cat > /etc/dracut.conf.d/01-margine-no-hostonly.conf <<'CONF' # Required for bootc / OCI image builds: the build environment is not # the deployment environment, so initramfs must be generic. hostonly="no" hostonly_cmdline="no" CONF
install.shlines 409-415.
Lesson — dracut writes to
/boot/; ostree reads/usr/lib/modules/<kver>/Symptom: same panic. The published image did contain a correct 303 MB generic initramfs — at/boot/initramfs-7.0.8-cachyos1.fc44.x86_64.img. Root cause: bootc/ostree picks the initramfs from/usr/lib/modules/<kver>/initramfs.imgat deploy time and ignores/boot/(dracut's traditional default output). With nothing at the canonical path, ostree falls back to deploy-time auto-generation — host-only again. Fix: pass the output path as dracut's positional argument (see the final loop below). Verified against the Bluefin DX base image, which keeps its initramfs at exactly that path.
Lesson — the
ostreedracut module is never auto-included Symptom: with all three fixes in, boot got past the initramfs and dropped to a dracut emergency shell:Failed to switch root: os-release file is missing./sysrootcontained onlyhome/ root/ var/— raw btrfs subvolumes, not a deployment. Root cause: dracut does not include theostreemodule just because the build host is ostree-based. Without it the initramfs lacksostree-prepare-root, the tool that pivots/sysrootfrom the raw disk root to the deployment checkout before systemd's switch-root. Diagnosed withlsinitrd <initramfs> | grep ostree→ zero lines on the published image. Fix:--add "ostree"on every dracut invocation.--no-hostonlyalone is not sufficient.
The final, correct invocation — all four lessons folded in:
for kver_dir in /usr/lib/modules/*/; do
kver=$(basename "$kver_dir")
dracut --force --no-hostonly --no-hostonly-cmdline \
--add "ostree" \
--kver "$kver" \
"${kver_dir}initramfs.img"
log "Wrote ${kver_dir}initramfs.img ($(du -h ${kver_dir}initramfs.img | cut -f1))"
done
/var/home/daniel/dev/margine-image/build_files/custom-kernel/install.sh (lines 460-467)
One initramfs per kernel directory (there is exactly one — §3.2.2), written to the bootc-canonical path, generic, with ostree support. Two peripheral details from the surrounding script: dracut runs after module signing so the modules copied into the initramfs are the signed ones, and mkdir -p /root && chmod 700 /root beforehand silences a spurious dracut-install: ERROR: installing '/root' from the ssh-client module probing for /root/.ssh/ in a sysroot where /root doesn't exist (cosmetic; the alternative omit_dracutmodules+=" ssh-client " would also drop dropbear-based remote LUKS unlock support).
Lesson — the inherited
ostree.linuxOCI label points at the old kernel Symptom: initramfs fully fixed, boot fails atinitrd-switch-root.service; bootloader entries reference deployment hashes that don't exist on disk. Root cause: Bluefin DX labels its imageostree.linux=<bluefin-kernel-version>. The kernel swap replaced the files but inherited the label, and bootc/rpm-ostree consultostree.linuxat deploy time to pick the kernel version for the bootloader entry and to find/usr/lib/modules/<label>/. Pointed at a nonexistent kernel, deployment-dir hash and bootloader-entry hash diverge. Fix: rewrite the label after build from the image's actual content (buildah config --label ostree.linux=<kver>, reading<kver>from/usr/lib/modules/inside the built image). In the current pipeline this is subsumed by the rechunk step (hhd-dev/rechunk@v1.2.4in/var/home/daniel/dev/margine-image/.github/workflows/build.ymllines 448-464), which re-commits the image in ostree-canonical form; a CI invariant check still asserts label == installed kernel on every build. General rule:FROMinherits all of the base's OCI labels. Any label describing content you changed (kernel version, rechunk manifest) must be overwritten.
3.5 Validating the swap
Three layers, because "image builds green" ≠ "image boots":
- Build-time (CI, blocks push):
/usr/lib/modules/*/initramfs.imgexists; initramfs size sanity (>50 MB — a host-only one is <30);lsinitrd | grep ostree-prepare-rootnon-empty; dm-crypt/dm-mod/btrfs/virtio_blk present as modules or builtins;ostree.linuxlabel matches the installed kernel. Every one of these maps 1:1 to a bug above. - Boot-time (CI, gates
:stable): the QEMU smoke-boot workflow boots the candidate qcow2 under OVMF+swtpm and requiresReached target Multi-User Systembeforeskopeo copy --preserve-digestspromotes candidate →:stable(CI chapter). - On-machine (user-run):
margine-validate-cachyos-kernelconfirms the running kernel is actually the shipped one:
section "Kernel"
kernel=$(uname -a)
if printf '%s\n' "$kernel" | grep -Eiq 'cachy|cachyos'; then
ok "running kernel appears to be CachyOS"
else
fail "running kernel does not appear to be CachyOS"
fi
/home/daniel/dev/margine-fedora-atomic/scripts/validate-cachyos-kernel (lines 30-38)
It also warns if stock Fedora kernel-* RPMs are visible in the deployment, and flags common out-of-tree module packages (nvidia, zfs, vbox) as out-of-policy.
Alternatives & other distros
Approaches to "which kernel ships in the image", roughly by increasing maintenance cost:
- Stock Fedora kernel, untouched — Bluefin, Aurora, Silverblue/Kinoite, Fedora CoreOS. Signed by Fedora, boots under Secure Boot with zero ceremony, zero pipeline cost. The default; deviate only with a measured reason (ADR 0006 option C).
- Stock kernel + prebuilt akmods from
ghcr.io/ublue-os/akmods— Bluefin DX, Aurora, uBlue NVIDIA variants. Universal Blue builds/signs kmods (nvidia, xone, v4l2loopback, …) in dedicated OCI images; consumersCOPY --from=ghcr.io/ublue-os/akmods:main-<fedora>the RPMs in. No compiler in your build, modules signed with the uBlue key (whose MOK users enroll once). The cleanest pattern if the kernels/kmods you need are already published. - OGC kernel (
ghcr.io/ublue-os/akmods:ogc-…flavor) — Bazzite (migrated off its ownkernel-bazzitefork, archived 2026-05-01), Nobara, ChimeraOS, Playtron, PikaOS. Shared 8-distro CI, upstream-first charter, handheld HID + NTSYNC + gyro in tree. The gaming-consensus kernel; no BORE/ThinLTO/HZ=1000 by default. - Surface kernels — Bluefin's
-surfaceimages swap in the linux-surface kernel the same remove-and-replace way, for Microsoft Surface hardware support. Demonstrates the pattern generalizes to any hardware-enablement tree. - COPR kernel installed in your own build — Margine (this chapter), Origami Linux (whose
custom-kernel.shMargine's script descends from). Maximum flexibility, you own signing, retries, initramfs, and the single-maintainer-COPR risk. - Runtime layering on the deployed machine —
rpm-ostree override remove kernel{,-core,-modules,...} --install kernel-cachyos. Margine's pre-image lab path (/home/daniel/dev/margine-fedora-atomic/docs/03-cachyos-kernel.md); works per-machine, rollback-safe, but unsigned (Secure Boot off only), per-machine drift, and every deployment rebuilds the override. Kept as documentation, superseded by image-baking. - NixOS —
boot.kernelPackages = pkgs.linuxPackages_cachyos;declaratively; module packages rebuilt by Nix against the chosen kernel. Same outcome, entirely different toolchain. - openSUSE MicroOS/Aeon — stock SUSE kernel via transactional-update/snapper snapshots; custom kernels are plain zypper packages in a transaction. Rollback via btrfs snapshot instead of image swap.
- Vanilla OS (ABRoot) — Debian-based A/B partitions; kernel changes go through ABRoot transactions on the inactive root.
- UKI / sealed images (systemd-boot + unified kernel image + composefs fs-verity) — tracked by Margine in ADR 0007, not yet shipping anywhere mainstream on the Fedora desktop track. Would replace the vmlinuz+initramfs pair (and most of §3.4) with a single signed PE binary; the long-term direction for measured boot.