Skip to content

lkl: per-thread irqs_enabled fixes IRQ-leak hangs under real drivers#638

Open
josephnef wants to merge 3 commits into
lkl:masterfrom
josephnef:irqs-enabled-per-thread
Open

lkl: per-thread irqs_enabled fixes IRQ-leak hangs under real drivers#638
josephnef wants to merge 3 commits into
lkl:masterfrom
josephnef:irqs-enabled-per-thread

Conversation

@josephnef
Copy link
Copy Markdown

@josephnef josephnef commented May 27, 2026

Summary

arch/lkl/kernel/irq.c keeps irqs_enabled as a single static bool
global. That is fine for the in-tree test suite, which doesn't
exercise the failure paths it exposes, but it has two correctness
consequences for code that runs on LKL outside those tests:

  1. No per-thread isolation of IRQ-enable state across context
    switches. Any kernel path that does spin_lock_irqsave and
    schedules before the matching restore leaks the DISABLED value to
    whichever thread runs next. (Sleeping with IRQs disabled is not a
    supported Linux pattern, and isn't the load-bearing motivation for
    this series — patch 1 is just the refactor that enables patch 2.)

  2. Host-thread callers of lkl_trigger_irq honour stale state.
    lkl_trigger_irq is documented as callable "from arbitrary host
    threads," but reads irqs_enabled even when the caller is a host
    pthread that doesn't own the current thread_info. Host pthreads
    (libusb completion callbacks, glibc SIGEV_THREAD timer callbacks,
    anything created by a backend library that ends up posting an IRQ
    into the kernel from outside any kernel context) acquire the LKL
    CPU via lkl_cpu_get but never go through __switch_to, so
    _current_thread_info still points at whichever kernel task last
    ran, often the idle task. Honouring its stale irqs_enabled
    silently pends every host-injected IRQ. This is the load-bearing
    bug, and patch 2 fixes it.

Context where this was hit

A research proof-of-concept that runs the mainline Linux rtw88 USB
Wi-Fi driver entirely in userspace via LKL, on a real USB Wi-Fi
adapter. The host program links liblkl.a, registers a virtual USB
host controller into the in-process kernel (a ~470-line shim in the
drivers/usb/host/-style that translates struct urb to a flat view
struct and back), and forwards URBs to libusb running on the host
process. libusb's event thread is a true POSIX pthread; URB
completions are posted back into the LKL kernel from that thread via
lkl_trigger_irq, which is exactly the path bug 2 affects.

Behaviour with this series applied:

  • On the master kernel, the rtw88 driver hangs the in-process LKL
    kernel within ~50 USB control transfers, reliably. jiffies stops
    advancing; msleep-based kthreads freeze. gdb confirms the
    global irqs_enabled is stuck at 0.
  • With this series, the same driver runs ~8000 control transfers,
    finishes firmware download, brings wlan0 up, and the host can
    open an AF_PACKET socket on wlan0 and capture 802.11 frames
    with radiotap headers. Validated on three RTL chips (RTL8812AU
    0bda:8812, RTL8821AU 2357:0120, RTL8814AU 0bda:8813). TX
    (radiotap injection via AF_PACKET sendto on wlan0 in monitor
    mode) also works.

What the series does

Three commits, in order:

  1. lkl: make irqs_enabled per-thread via current_thread_info()
    — A pure refactor. Move irqs_enabled from a static bool global
    in irq.c into a field on struct thread_info; access via
    current_thread_info() from arch_local_save_flags /
    arch_local_irq_restore / lkl_trigger_irq. No explicit
    save/restore code
    — the existing
    _current_thread_info = task_thread_info(next); line in
    __switch_to is the entire mechanism. IRQ-enable state moves with
    its thread, the same way a real CPU's register file does. No
    behavioural change for the existing test suite.

  2. lkl: deliver IRQs from host-thread callers regardless of irqs_enabled — The semantic fix. In lkl_trigger_irq, detect
    host-thread callers via
    lkl_ops->thread_equal(ti->tid, lkl_ops->thread_self()) and
    deliver unconditionally; kernel-thread callers (including the
    recursive lkl_trigger_irq -> IRQ -> softirq -> lkl_trigger_irq
    path called out in the original comment) continue to honour their
    own per-thread irqs_enabled. Depends on patch 1 to give the
    owner-vs-self tid comparison something precise to compare.

  3. lkl: add KUnit test for host-pthread caller of lkl_trigger_irq
    — Locks in the contract. New CONFIG_LKL_IRQ_KUNIT_TEST +
    arch/lkl/kernel/irq_test.c. The test spawns a true host pthread
    via lkl_ops->thread_create, has it call lkl_trigger_irq on a
    kernel-registered IRQ from outside any kernel context, and asserts
    the handler runs synchronously. Wired into the existing
    kunit=yes CI lane in tools/lkl/Makefile.autoconf alongside
    LKL_PCI_KUNIT_TEST, and into tools/lkl/tests/boot.c as
    kunit_irq (mirrors kunit_pci / kunit_mmu).

Test plan

  • CI kunit lane: the new kunit_irq test passes via the
    standard ok N lkl_irq line in the boot log (visible in the
    github-actions Test Results comment on this PR).
  • CI linux / windows-2022 / clang-build / mmu_kasan
    lanes: unchanged behaviour. The
    current_thread_info()->irqs_enabled field is initialised in
    INIT_THREAD_INFO and init_ti; pre-existing tests aren't
    affected.
  • checkpatch: clean.
  • End-to-end driver bring-up under the libusb-backed HCD shim
    described above: rtw88-family driver validated locally on the
    three RTL chips. All come up with this series applied; all hang
    on the un-patched kernel.

Note on the previous version of this PR

An earlier revision of this series included a KUnit test that
provoked the cross-thread leak by doing spin_lock_irqsave +
schedule_timeout in one kthread while a sibling observed the IRQ
state — a pattern that is invalid in Linux generally. That framing
was misleading: the actual real-world bug is the host-pthread caller
path (bug 2 above), and the test now exercises that path directly,
via lkl_ops->thread_create and a registered IRQ handler. The
schedule-while-disabled framing is gone from both the test and the
commit messages.

@github-actions
Copy link
Copy Markdown

Test Results

106 files  ±0  106 suites  ±0   7m 24s ⏱️ -20s
206 tests +1  195 ✅ +1  11 💤 ±0  0 ❌ ±0 
823 runs  +1  767 ✅ +1  56 💤 ±0  0 ❌ ±0 

Results for commit 45352f4. ± Comparison against base commit d25752c.

Copy link
Copy Markdown
Member

@tavip tavip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sleeping with interrupts disabled is not permitted in Linux:

https://share.google/aimode/5O1kJJ2idJt2FDdyE

Could you share the problem you are facing in your project, there are ways to avoids this issue with different design pattern

@josephnef
Copy link
Copy Markdown
Author

@tavip — thanks for the quick read.

Could you share the problem you are facing in your project, there are ways to avoids this issue with different design pattern

Yes, happy to. Some background, then the specific failure mode, then a proposal for the test.

The project (R&D, not production)

A proof-of-concept that runs the mainline Linux rtw88 USB Wi-Fi driver entirely in userspace via LKL, driving a real USB Wi-Fi adapter. The kernel-side rtw88_8812au / rtw88_8821au / aircrack-ng's 88XXau driver is compiled into liblkl.a essentially unmodified; the host C program then:

  1. Registers a virtual USB host controller into the in-process kernel — a ~470-line drivers/usb/host/lkl-hcd.c shim implementing the struct hc_driver interface. URBs from the in-kernel driver are marshalled into a flat view struct (so the host-side backend doesn't need <linux/usb.h>) and handed to a registered backend.
  2. The backend is a thin wrapper around libusb: libusb_open_device_with_vid_pid, libusb_fill_*_transfer, libusb_submit_transfer, completion callback. libusb runs its own event thread for completions.
  3. Once the in-process kernel's rtw88 driver brings wlan0 up, the host program calls lkl_if_up(wlan0), uses nl80211 to set monitor mode, and opens an AF_PACKET socket on wlan0 to read raw 802.11 frames with radiotap headers.

End-to-end RX validated on three RTL chips (RTL8812AU 0bda:8812, RTL8821AU 2357:0120, RTL8814AU 0bda:8813). TX (radiotap injection via AF_PACKET sendto on wlan0 in monitor mode) also works. No host kernel module, no CAP_NET_ADMIN — the whole point of the exercise is reusing the kernel driver as-is from userspace.

The actual failure

You're right that what the patch-1 KUnit test demonstrates — spin_lock_irqsave + schedule_timeout in a kernel thread — is not a supported Linux pattern, and rtw88 doesn't do it. That framing in the test is weaker than the real motivation. The actual failure mode we hit is the host-thread leg of the same irqs_enabled pathology, and it's what patch 3 (lkl: deliver IRQs from host-thread callers regardless of irqs_enabled) addresses.

libusb's event thread is a true POSIX pthread. When it posts URB completions into the LKL kernel via lkl_trigger_irq, it:

  • acquires the LKL CPU via lkl_cpu_get,
  • the existing code in lkl_trigger_irq reads irqs_enabled (master) / current_thread_info()->irqs_enabled (after patch 2),
  • but the host pthread never went through __switch_to. _current_thread_info still points at whichever kernel task last ran, often the idle task — and that task's irqs_enabled is routinely DISABLED mid-local_irq_disable/halt.

The host thread's lkl_trigger_irq therefore observes DISABLED, marks the IRQ pending, returns without delivering. Nothing in normal kernel context subsequently runs the pending IRQ (no one is mid-local_irq_save/restore from the kernel side — the host thread isn't a kernel task at all). Net effect: silent IRQ-pending, the timer eventually gets pended, jiffies stalls, every msleep-based kthread freezes. The rtw88 probe hangs in ~50 USB control transfers, reliably. With patch 3 applied (detect host-pthread caller via lkl_ops->thread_equal(ti->tid, lkl_ops->thread_self()) and deliver unconditionally — the host pthread doesn't own the kernel's irqs_enabled field of any thread, so it shouldn't honor it), probe runs ~8000 control transfers and wlan0 comes up cleanly.

Patch 2 (make irqs_enabled per-thread) is the underlying refactor that makes patch 3 implementable cleanly: with a per-thread field plus the owner-vs-self tid check in patch 3, the host-caller path can be precisely distinguished from the kernel-caller path.

The test

Given the above — I'll replace the KUnit suite in patch 1 with one that actually exercises the host-thread path. Shape:

  • Use lkl_host_ops->thread_create to spawn a real host pthread.
  • The host pthread calls lkl_cpu_get + lkl_trigger_irq(test_irq) from outside any kernel context.
  • A kernel-side handler for test_irq sets a flag.
  • The main test thread (kernel context) asserts the flag became set synchronously / within a short bounded time. On master, the handler is silently pended and the flag never goes up (or only goes up much later when something incidental drains pending IRQs); with patches 2 + 3, the handler runs immediately.

That mirrors what libusb-backend-style hosts actually do, and avoids the spin_lock_irqsave + schedule pattern entirely.

I'll respin with that test + the checkpatch warning fix (trailing */ in threads.c) over the next day or so.

Joseph added 3 commits May 28, 2026 09:51
irqs_enabled in arch/lkl/kernel/irq.c is a single `static bool`
global, with no save/restore in __switch_to. That is fine for the
existing test suite, which doesn't exercise the failure paths it
exposes, but it has two correctness consequences for code outside
the in-tree tests:

 - Any kernel path that does spin_lock_irqsave and schedules before
   the matching restore (an unsupported pattern in Linux; lockdep
   would catch it on a normal kernel — LKL doesn't) leaks the
   DISABLED value to whichever thread runs next, and the next
   thread's restore overwrites it. The save/restore semantics of
   arch_local_irq_save / arch_local_irq_restore are not per-thread
   in LKL today.

 - More importantly, a host pthread (one created via
   lkl_ops->thread_create, e.g. a libusb event thread or a
   GLib/Qt timer callback that backs a virtio/USB-style host
   shim) invoking lkl_trigger_irq from outside any kernel context
   reads whatever value the last kernel task to run left in the
   global. The next commit relies on per-thread irqs_enabled to
   distinguish such host callers cleanly; this commit prepares the
   field for that distinction.

This patch is a pure refactor: move irqs_enabled into
struct thread_info; access via current_thread_info() from
arch_local_save_flags, arch_local_irq_restore, and the
lkl_trigger_irq pending-check. No explicit save/restore is added
to __switch_to — the existing

    _current_thread_info = task_thread_info(next);

line is the entire mechanism. Each thread's irqs_enabled travels
with its thread_info, the same way a real CPU's register file
follows the thread.

Behaviour for the existing test suite is identical (the suite
doesn't exercise the cross-thread leak). The follow-up commit
("lkl: deliver IRQs from host-thread callers regardless of
irqs_enabled") uses the per-thread field to fix the host-caller
case observed in real-world backends.

 - arch/lkl/include/asm/thread_info.h: add `unsigned long
   irqs_enabled` field; INIT_THREAD_INFO sets it to 1
   (ARCH_IRQ_ENABLED) so the init task starts enabled.
 - arch/lkl/kernel/irq.c: drop the `static bool irqs_enabled`
   global. arch_local_save_flags, arch_local_irq_restore, and the
   lkl_trigger_irq check all go through current_thread_info().
 - arch/lkl/kernel/threads.c: init_ti sets ti->irqs_enabled =
   ARCH_IRQ_ENABLED for freshly-allocated kernel threads.

Signed-off-by: Joseph <joseph@josephnef.dev>
lkl_trigger_irq is documented as callable "from arbitrary host
threads" (see the comment block above the function). True host
pthreads — libusb completion threads, glibc SIGEV_THREAD timer
callbacks, anything created by a backend library that ends up
posting an IRQ into the LKL kernel — acquire the LKL CPU via
lkl_cpu_get but never go through __switch_to. _current_thread_info
therefore still points at whichever kernel task last ran (often the
idle task), and that task's irqs_enabled field may be
ARCH_IRQ_DISABLED at the moment the host caller reads it.

Honoring that stale flag for host-thread callers is a silent
IRQ-pending hang: the IRQ gets marked pending, but nothing on the
kernel side notices until the matching irqrestore in the original
kernel context — which often never comes, because the kernel has
already moved on. Drivers that post IRQs from host-thread backends
(e.g. an out-of-tree USB host-controller shim that forwards URBs
to libusb and signals completion from libusb's event thread; any
virtio/host-shim backend with a thread-based notification scheme
has the same exposure) hang the kernel within tens of operations.

Detect host-thread callers by comparing thread_self() to the
thread_info owner's tid via lkl_ops->thread_equal. When they
differ, the caller is not the kernel thread that owns this
thread_info; the stale irqs_enabled field has no claim on us, and
we deliver the IRQ. Kernel-thread callers — including the
recursive "lkl_trigger_irq -> IRQ -> softirq -> lkl_trigger_irq"
path called out in the original comment — continue to honor their
own per-thread irqs_enabled (set by the previous commit).

Signed-off-by: Joseph <joseph@josephnef.dev>
lkl_trigger_irq is documented as callable "from arbitrary host
threads," and host-thread-driven backends (the in-tree
virtio-net-tap and similar) rely on that contract. The previous
commit ("lkl: deliver IRQs from host-thread callers regardless of
irqs_enabled") makes the contract hold even when the kernel task
currently in current_thread_info() has irqs_enabled disabled.

Add a small KUnit suite (CONFIG_LKL_IRQ_KUNIT_TEST) that locks
that contract in:

 - Allocate an IRQ via lkl_get_free_irq + request_irq with a
   handler that increments a counter and completes a completion.
 - Spawn a true host pthread via lkl_ops->thread_create.
 - The host pthread calls lkl_trigger_irq on the registered IRQ
   from outside any kernel context.
 - wait_for_completion_timeout from the test kthread releases the
   LKL CPU so the host pthread can acquire it via
   lkl_cpu_try_run_irq inside lkl_trigger_irq, and gives the
   handler 500 ms to fire.
 - Assert the handler ran exactly once.

Wiring:

 - arch/lkl/kernel/irq_test.c: the new KUnit suite (.name = "lkl_irq").
 - arch/lkl/kernel/Makefile: build it when CONFIG_LKL_IRQ_KUNIT_TEST=y.
 - arch/lkl/Kconfig: new boolean depends on KUNIT.
 - tools/lkl/Makefile.autoconf: kunit_test_enable also sets
   LKL_IRQ_KUNIT_TEST, so the existing kunit=yes CI lane picks it
   up alongside LKL_PCI_KUNIT_TEST.
 - tools/lkl/tests/boot.c: lkl_test_kunit_irq parses the boot log
   for "ok N lkl_irq", mirroring lkl_test_kunit_pci.

Signed-off-by: Joseph <joseph@josephnef.dev>
@josephnef josephnef force-pushed the irqs-enabled-per-thread branch from 45352f4 to 79779c9 Compare May 28, 2026 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants