Skip to content

fix: data race on last_storage_ptr_ cache in gil_safe_call_once_and_store#6087

Draft
henryiii wants to merge 1 commit into
pybind:masterfrom
henryiii:fix/gil-safe-call-once-atomic-cache
Draft

fix: data race on last_storage_ptr_ cache in gil_safe_call_once_and_store#6087
henryiii wants to merge 1 commit into
pybind:masterfrom
henryiii:fix/gil-safe-call-once-atomic-cache

Conversation

@henryiii

Copy link
Copy Markdown
Collaborator

🤖 AI text below 🤖

Problem

In include/pybind11/gil_safe_call_once.h, the PYBIND11_HAS_SUBINTERPRETER_SUPPORT branch of gil_safe_call_once_and_store caches the per-interpreter storage pointer in T *last_storage_ptr_ as a fast path for the single-interpreter case.

This plain pointer is:

  • written in call_once_and_store_result() (inside the std::call_once lambda) and in get_stored(), and
  • read in get_stored().

Under free-threaded CPython (Py_GIL_DISABLED), gil_scoped_acquire provides no mutual exclusion, so these concurrent unsynchronized reads/writes of a plain pointer are a C++ data race (undefined behavior).

Additionally, get_stored() loaded the cached pointer before calling is_last_storage_valid(). The writer publishes the pointer (last_storage_ptr_) before setting the validity flag (is_initialized_by_at_least_one_interpreter_), so a reader must observe the flag first and only then load the pointer to get correct acquire/release ordering.

Fix

  • Make last_storage_ptr_ a std::atomic<T *> (<atomic> is already included in this branch). Default seq_cst operations pair with the existing atomic validity flag.
  • Restructure get_stored() to check is_last_storage_valid() first, and only load last_storage_ptr_ on the fast path; on the slow path, look up the per-interpreter storage and update the cache as before.

The change is minimal and behavior-preserving on the non-free-threaded build.

Not addressed here

The separate embedded finalize / re-init staleness concern (a stale last_storage_ptr_ surviving interpreter finalize + re-init within the same process) is not fixed by this PR. It is tracked separately in the issue.

Verification

Compiled a scratch translation unit including <pybind11/gil_safe_call_once.h> and instantiating pybind11::gil_safe_call_once_and_store<int> with c++ -std=c++17 -fsyntax-only against Python 3.14 (the subinterpreter branch is active by default for Python >= 3.12). Compiles cleanly. clang-format applied.

Part of #6084

…tore

Under free-threaded CPython (Py_GIL_DISABLED) the GIL provides no mutual
exclusion, so the plain pointer last_storage_ptr_ was read and written
concurrently without synchronization, a C++ data race. Make it a
std::atomic<T *>.

Also reorder get_stored() to check is_last_storage_valid() before loading
the cached pointer. The writer publishes the pointer before setting the
validity flag, so the flag must be observed first for correct
acquire/release ordering.

Assisted-by: ClaudeCode:claude-fable-5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant