lkl: per-thread irqs_enabled fixes IRQ-leak hangs under real drivers by josephnef · Pull Request #638 · lkl/linux

josephnef · 2026-05-27T12:49:12Z

Summary

arch/lkl/kernel/irq.c keeps irqs_enabled as a single static bool
global. That is fine for the in-tree test suite, which doesn't
exercise the failure paths it exposes, but it has two correctness
consequences for code that runs on LKL outside those tests:

No per-thread isolation of IRQ-enable state across context
switches. Any kernel path that does spin_lock_irqsave and
schedules before the matching restore leaks the DISABLED value to
whichever thread runs next. (Sleeping with IRQs disabled is not a
supported Linux pattern, and isn't the load-bearing motivation for
this series — patch 1 is just the refactor that enables patch 2.)
Host-thread callers of lkl_trigger_irq honour stale state.
lkl_trigger_irq is documented as callable "from arbitrary host
threads," but reads irqs_enabled even when the caller is a host
pthread that doesn't own the current thread_info. Host pthreads
(libusb completion callbacks, glibc SIGEV_THREAD timer callbacks,
anything created by a backend library that ends up posting an IRQ
into the kernel from outside any kernel context) acquire the LKL
CPU via lkl_cpu_get but never go through __switch_to, so
_current_thread_info still points at whichever kernel task last
ran, often the idle task. Honouring its stale irqs_enabled
silently pends every host-injected IRQ. This is the load-bearing
bug, and patch 2 fixes it.

Context where this was hit

A research proof-of-concept that runs the mainline Linux rtw88 USB
Wi-Fi driver entirely in userspace via LKL, on a real USB Wi-Fi
adapter. The host program links liblkl.a, registers a virtual USB
host controller into the in-process kernel (a ~470-line shim in the
drivers/usb/host/-style that translates struct urb to a flat view
struct and back), and forwards URBs to libusb running on the host
process. libusb's event thread is a true POSIX pthread; URB
completions are posted back into the LKL kernel from that thread via
lkl_trigger_irq, which is exactly the path bug 2 affects.

Behaviour with this series applied:

On the master kernel, the rtw88 driver hangs the in-process LKL
kernel within ~50 USB control transfers, reliably. jiffies stops
advancing; msleep-based kthreads freeze. gdb confirms the
global irqs_enabled is stuck at 0.
With this series, the same driver runs ~8000 control transfers,
finishes firmware download, brings wlan0 up, and the host can
open an AF_PACKET socket on wlan0 and capture 802.11 frames
with radiotap headers. Validated on three RTL chips (RTL8812AU
0bda:8812, RTL8821AU 2357:0120, RTL8814AU 0bda:8813). TX
(radiotap injection via AF_PACKET sendto on wlan0 in monitor
mode) also works.

What the series does

Three commits, in order:

lkl: make irqs_enabled per-thread via current_thread_info()
— A pure refactor. Move irqs_enabled from a static bool global
in irq.c into a field on struct thread_info; access via
current_thread_info() from arch_local_save_flags /
arch_local_irq_restore / lkl_trigger_irq. No explicit
save/restore code — the existing
_current_thread_info = task_thread_info(next); line in
__switch_to is the entire mechanism. IRQ-enable state moves with
its thread, the same way a real CPU's register file does. No
behavioural change for the existing test suite.
lkl: deliver IRQs from host-thread callers regardless of irqs_enabled — The semantic fix. In lkl_trigger_irq, detect
host-thread callers via
lkl_ops->thread_equal(ti->tid, lkl_ops->thread_self()) and
deliver unconditionally; kernel-thread callers (including the
recursive lkl_trigger_irq -> IRQ -> softirq -> lkl_trigger_irq
path called out in the original comment) continue to honour their
own per-thread irqs_enabled. Depends on patch 1 to give the
owner-vs-self tid comparison something precise to compare.
lkl: add KUnit test for host-pthread caller of lkl_trigger_irq
— Locks in the contract. New CONFIG_LKL_IRQ_KUNIT_TEST +
arch/lkl/kernel/irq_test.c. The test spawns a true host pthread
via lkl_ops->thread_create, has it call lkl_trigger_irq on a
kernel-registered IRQ from outside any kernel context, and asserts
the handler runs synchronously. Wired into the existing
kunit=yes CI lane in tools/lkl/Makefile.autoconf alongside
LKL_PCI_KUNIT_TEST, and into tools/lkl/tests/boot.c as
kunit_irq (mirrors kunit_pci / kunit_mmu).

Test plan

CI kunit lane: the new kunit_irq test passes via the
standard ok N lkl_irq line in the boot log (visible in the
github-actions Test Results comment on this PR).
CI linux / windows-2022 / clang-build / mmu_kasan
lanes: unchanged behaviour. The
current_thread_info()->irqs_enabled field is initialised in
INIT_THREAD_INFO and init_ti; pre-existing tests aren't
affected.
checkpatch: clean.
End-to-end driver bring-up under the libusb-backed HCD shim
described above: rtw88-family driver validated locally on the
three RTL chips. All come up with this series applied; all hang
on the un-patched kernel.

Note on the previous version of this PR

An earlier revision of this series included a KUnit test that
provoked the cross-thread leak by doing spin_lock_irqsave +
schedule_timeout in one kthread while a sibling observed the IRQ
state — a pattern that is invalid in Linux generally. That framing
was misleading: the actual real-world bug is the host-pthread caller
path (bug 2 above), and the test now exercises that path directly,
via lkl_ops->thread_create and a registered IRQ handler. The
schedule-while-disabled framing is gone from both the test and the
commit messages.

github-actions · 2026-05-28T01:25:27Z

Test Results

106 files ±0 106 suites ±0 7m 24s ⏱️ -20s
206 tests +1 195 ✅ +1 11 💤 ±0 0 ❌ ±0
823 runs +1 767 ✅ +1 56 💤 ±0 0 ❌ ±0

Results for commit 45352f4. ± Comparison against base commit d25752c.

tavip

Sleeping with interrupts disabled is not permitted in Linux:

https://share.google/aimode/5O1kJJ2idJt2FDdyE

Could you share the problem you are facing in your project, there are ways to avoids this issue with different design pattern

josephnef · 2026-05-28T06:43:10Z

@tavip — thanks for the quick read.

Could you share the problem you are facing in your project, there are ways to avoids this issue with different design pattern

Yes, happy to. Some background, then the specific failure mode, then a proposal for the test.

The project (R&D, not production)

A proof-of-concept that runs the mainline Linux rtw88 USB Wi-Fi driver entirely in userspace via LKL, driving a real USB Wi-Fi adapter. The kernel-side rtw88_8812au / rtw88_8821au / aircrack-ng's 88XXau driver is compiled into liblkl.a essentially unmodified; the host C program then:

Registers a virtual USB host controller into the in-process kernel — a ~470-line drivers/usb/host/lkl-hcd.c shim implementing the struct hc_driver interface. URBs from the in-kernel driver are marshalled into a flat view struct (so the host-side backend doesn't need <linux/usb.h>) and handed to a registered backend.
The backend is a thin wrapper around libusb: libusb_open_device_with_vid_pid, libusb_fill_*_transfer, libusb_submit_transfer, completion callback. libusb runs its own event thread for completions.
Once the in-process kernel's rtw88 driver brings wlan0 up, the host program calls lkl_if_up(wlan0), uses nl80211 to set monitor mode, and opens an AF_PACKET socket on wlan0 to read raw 802.11 frames with radiotap headers.

End-to-end RX validated on three RTL chips (RTL8812AU 0bda:8812, RTL8821AU 2357:0120, RTL8814AU 0bda:8813). TX (radiotap injection via AF_PACKET sendto on wlan0 in monitor mode) also works. No host kernel module, no CAP_NET_ADMIN — the whole point of the exercise is reusing the kernel driver as-is from userspace.

The actual failure

You're right that what the patch-1 KUnit test demonstrates — spin_lock_irqsave + schedule_timeout in a kernel thread — is not a supported Linux pattern, and rtw88 doesn't do it. That framing in the test is weaker than the real motivation. The actual failure mode we hit is the host-thread leg of the same irqs_enabled pathology, and it's what patch 3 (lkl: deliver IRQs from host-thread callers regardless of irqs_enabled) addresses.

libusb's event thread is a true POSIX pthread. When it posts URB completions into the LKL kernel via lkl_trigger_irq, it:

acquires the LKL CPU via lkl_cpu_get,
the existing code in lkl_trigger_irq reads irqs_enabled (master) / current_thread_info()->irqs_enabled (after patch 2),
but the host pthread never went through __switch_to. _current_thread_info still points at whichever kernel task last ran, often the idle task — and that task's irqs_enabled is routinely DISABLED mid-local_irq_disable/halt.

The host thread's lkl_trigger_irq therefore observes DISABLED, marks the IRQ pending, returns without delivering. Nothing in normal kernel context subsequently runs the pending IRQ (no one is mid-local_irq_save/restore from the kernel side — the host thread isn't a kernel task at all). Net effect: silent IRQ-pending, the timer eventually gets pended, jiffies stalls, every msleep-based kthread freezes. The rtw88 probe hangs in ~50 USB control transfers, reliably. With patch 3 applied (detect host-pthread caller via lkl_ops->thread_equal(ti->tid, lkl_ops->thread_self()) and deliver unconditionally — the host pthread doesn't own the kernel's irqs_enabled field of any thread, so it shouldn't honor it), probe runs ~8000 control transfers and wlan0 comes up cleanly.

Patch 2 (make irqs_enabled per-thread) is the underlying refactor that makes patch 3 implementable cleanly: with a per-thread field plus the owner-vs-self tid check in patch 3, the host-caller path can be precisely distinguished from the kernel-caller path.

The test

Given the above — I'll replace the KUnit suite in patch 1 with one that actually exercises the host-thread path. Shape:

Use lkl_host_ops->thread_create to spawn a real host pthread.
The host pthread calls lkl_cpu_get + lkl_trigger_irq(test_irq) from outside any kernel context.
A kernel-side handler for test_irq sets a flag.
The main test thread (kernel context) asserts the flag became set synchronously / within a short bounded time. On master, the handler is silently pended and the flag never goes up (or only goes up much later when something incidental drains pending IRQs); with patches 2 + 3, the handler runs immediately.

That mirrors what libusb-backend-style hosts actually do, and avoids the spin_lock_irqsave + schedule pattern entirely.

I'll respin with that test + the checkpatch warning fix (trailing */ in threads.c) over the next day or so.

irqs_enabled in arch/lkl/kernel/irq.c is a single `static bool` global, with no save/restore in __switch_to. That is fine for the existing test suite, which doesn't exercise the failure paths it exposes, but it has two correctness consequences for code outside the in-tree tests: - Any kernel path that does spin_lock_irqsave and schedules before the matching restore (an unsupported pattern in Linux; lockdep would catch it on a normal kernel — LKL doesn't) leaks the DISABLED value to whichever thread runs next, and the next thread's restore overwrites it. The save/restore semantics of arch_local_irq_save / arch_local_irq_restore are not per-thread in LKL today. - More importantly, a host pthread (one created via lkl_ops->thread_create, e.g. a libusb event thread or a GLib/Qt timer callback that backs a virtio/USB-style host shim) invoking lkl_trigger_irq from outside any kernel context reads whatever value the last kernel task to run left in the global. The next commit relies on per-thread irqs_enabled to distinguish such host callers cleanly; this commit prepares the field for that distinction. This patch is a pure refactor: move irqs_enabled into struct thread_info; access via current_thread_info() from arch_local_save_flags, arch_local_irq_restore, and the lkl_trigger_irq pending-check. No explicit save/restore is added to __switch_to — the existing _current_thread_info = task_thread_info(next); line is the entire mechanism. Each thread's irqs_enabled travels with its thread_info, the same way a real CPU's register file follows the thread. Behaviour for the existing test suite is identical (the suite doesn't exercise the cross-thread leak). The follow-up commit ("lkl: deliver IRQs from host-thread callers regardless of irqs_enabled") uses the per-thread field to fix the host-caller case observed in real-world backends. - arch/lkl/include/asm/thread_info.h: add `unsigned long irqs_enabled` field; INIT_THREAD_INFO sets it to 1 (ARCH_IRQ_ENABLED) so the init task starts enabled. - arch/lkl/kernel/irq.c: drop the `static bool irqs_enabled` global. arch_local_save_flags, arch_local_irq_restore, and the lkl_trigger_irq check all go through current_thread_info(). - arch/lkl/kernel/threads.c: init_ti sets ti->irqs_enabled = ARCH_IRQ_ENABLED for freshly-allocated kernel threads. Signed-off-by: Joseph <joseph@josephnef.dev>

lkl_trigger_irq is documented as callable "from arbitrary host threads" (see the comment block above the function). True host pthreads — libusb completion threads, glibc SIGEV_THREAD timer callbacks, anything created by a backend library that ends up posting an IRQ into the LKL kernel — acquire the LKL CPU via lkl_cpu_get but never go through __switch_to. _current_thread_info therefore still points at whichever kernel task last ran (often the idle task), and that task's irqs_enabled field may be ARCH_IRQ_DISABLED at the moment the host caller reads it. Honoring that stale flag for host-thread callers is a silent IRQ-pending hang: the IRQ gets marked pending, but nothing on the kernel side notices until the matching irqrestore in the original kernel context — which often never comes, because the kernel has already moved on. Drivers that post IRQs from host-thread backends (e.g. an out-of-tree USB host-controller shim that forwards URBs to libusb and signals completion from libusb's event thread; any virtio/host-shim backend with a thread-based notification scheme has the same exposure) hang the kernel within tens of operations. Detect host-thread callers by comparing thread_self() to the thread_info owner's tid via lkl_ops->thread_equal. When they differ, the caller is not the kernel thread that owns this thread_info; the stale irqs_enabled field has no claim on us, and we deliver the IRQ. Kernel-thread callers — including the recursive "lkl_trigger_irq -> IRQ -> softirq -> lkl_trigger_irq" path called out in the original comment — continue to honor their own per-thread irqs_enabled (set by the previous commit). Signed-off-by: Joseph <joseph@josephnef.dev>

lkl_trigger_irq is documented as callable "from arbitrary host threads," and host-thread-driven backends (the in-tree virtio-net-tap and similar) rely on that contract. The previous commit ("lkl: deliver IRQs from host-thread callers regardless of irqs_enabled") makes the contract hold even when the kernel task currently in current_thread_info() has irqs_enabled disabled. Add a small KUnit suite (CONFIG_LKL_IRQ_KUNIT_TEST) that locks that contract in: - Allocate an IRQ via lkl_get_free_irq + request_irq with a handler that increments a counter and completes a completion. - Spawn a true host pthread via lkl_ops->thread_create. - The host pthread calls lkl_trigger_irq on the registered IRQ from outside any kernel context. - wait_for_completion_timeout from the test kthread releases the LKL CPU so the host pthread can acquire it via lkl_cpu_try_run_irq inside lkl_trigger_irq, and gives the handler 500 ms to fire. - Assert the handler ran exactly once. Wiring: - arch/lkl/kernel/irq_test.c: the new KUnit suite (.name = "lkl_irq"). - arch/lkl/kernel/Makefile: build it when CONFIG_LKL_IRQ_KUNIT_TEST=y. - arch/lkl/Kconfig: new boolean depends on KUNIT. - tools/lkl/Makefile.autoconf: kunit_test_enable also sets LKL_IRQ_KUNIT_TEST, so the existing kunit=yes CI lane picks it up alongside LKL_PCI_KUNIT_TEST. - tools/lkl/tests/boot.c: lkl_test_kunit_irq parses the boot log for "ok N lkl_irq", mirroring lkl_test_kunit_pci. Signed-off-by: Joseph <joseph@josephnef.dev>

tavip reviewed May 28, 2026

View reviewed changes

Joseph added 3 commits May 28, 2026 09:51

josephnef force-pushed the irqs-enabled-per-thread branch from 45352f4 to 79779c9 Compare May 28, 2026 07:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lkl: per-thread irqs_enabled fixes IRQ-leak hangs under real drivers#638

lkl: per-thread irqs_enabled fixes IRQ-leak hangs under real drivers#638
josephnef wants to merge 3 commits into
lkl:masterfrom
josephnef:irqs-enabled-per-thread

josephnef commented May 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

tavip left a comment

Uh oh!

josephnef commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

josephnef commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context where this was hit

What the series does

Test plan

Note on the previous version of this PR

Uh oh!

github-actions Bot commented May 28, 2026

Test Results

Uh oh!

tavip left a comment

Choose a reason for hiding this comment

Uh oh!

josephnef commented May 28, 2026

The project (R&D, not production)

The actual failure

The test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

josephnef commented May 27, 2026 •

edited

Loading