Skip to content

fix: add rwlock for implicit opcache restarts (OOM/hash overflow)#2352

Open
superdav42 wants to merge 1 commit intophp:fix/opcache-safe-resetfrom
superdav42:fix/opcache-safe-reset-with-rwlock
Open

fix: add rwlock for implicit opcache restarts (OOM/hash overflow)#2352
superdav42 wants to merge 1 commit intophp:fix/opcache-safe-resetfrom
superdav42:fix/opcache-safe-reset-with-rwlock

Conversation

@superdav42
Copy link
Copy Markdown

Summary

Adds a pthread_rwlock around the request lifecycle to protect against implicit opcache restarts (OOM, hash overflow) that bypass the opcache_reset() override in #2073.

Problem

PR #2073 intercepts explicit opcache_reset() calls and drains all threads before the reset. However, WordPress (and other apps) trigger implicit opcache restarts through a different path:

  1. opcache_invalidate() fills opcache memory over time
  2. opcache detects OOM → calls zend_accel_schedule_restart(ACCEL_RESTART_OOM)
  3. Sets restart_pending = true
  4. Next php_request_startup()accel_activate() sees restart_pendingaccel_is_inactive() → proceeds with accel_interned_strings_restore_state() (memset on shared memory)
  5. Other threads are mid-request reading from that shared memory → crash

This path never calls opcache_reset(), so #2073's override doesn't fire.

Production evidence

Configuration Crash frequency
No fix Every 20-60 min
#2073 alone Every 10-12 min (no improvement)
rwlock alone (our #2349) Every 6-12 hours (partial — missed worker threads)
Combined (this PR) Testing in progress

Fix

Uses PHP's zend_accel_schedule_restart_hook (PHP 8.4+, declared in Zend/zend.h) to detect when an implicit restart is scheduled, then serializes the next php_request_startup() via a pthread_rwlock:

  • Normal requests: read lock (concurrent, single atomic CAS — zero measurable overhead)
  • When hook fires: sets an atomic flag
  • Next php_request_startup(): sees flag → acquires write lock (exclusive), blocks until all current requests finish php_request_shutdown() and release their read locks
  • Inside exclusive startup: accel_is_inactive() correctly sees no active threads, reset proceeds safely
  • After startup: write lock released, all threads resume

Why both fixes are needed

  • fix: safe opcache_reset #2073: Covers explicit opcache_reset() calls — drains threads via Go state machine, works on all PHP versions
  • This PR: Covers implicit OOM/hash-overflow restarts — serializes via rwlock, PHP 8.4+ only (hook doesn't exist before)

The rwlock is unconditional (all PHP versions get the lock around requests) but the hook registration is #if PHP_VERSION_ID >= 80400. On PHP 8.2/8.3, the rwlock adds negligible overhead with no functional benefit for implicit restarts.

Changes

  • frankenphp.c: +50 lines
    • pthread_rwlock_t frankenphp_opcache_rwlock — read-write lock
    • frankenphp_opcache_restart_hook() — sets atomic flag (PHP 8.4+)
    • php_thread() loop: read lock before php_request_startup(), unlock after php_request_shutdown(), write lock when restart flag is set
    • Hook registration after frankenphp_sapi_module.startup()

PR php#2073 drains all threads for explicit opcache_reset() calls, but
WordPress triggers implicit restarts via opcache_invalidate() filling
opcache memory → zend_accel_schedule_restart() → restart_pending → the
actual reset runs during the next php_request_startup(). That path
bypasses the opcache_reset() override and races on shared memory.

Add a pthread_rwlock around the request lifecycle:
- Normal requests: read lock (concurrent, single atomic CAS)
- When zend_accel_schedule_restart_hook fires: set an atomic flag
- Next php_request_startup(): sees flag → write lock (exclusive),
  blocks until all current requests complete, resets safely
- After startup: unlock, all threads resume

Combined with PR php#2073, this covers both explicit opcache_reset()
(thread drain) and implicit OOM/hash-overflow restarts (rwlock).

Testing: without this patch, crashes every ~10 min on a WordPress
multisite. With PR php#2073 alone, same crash frequency. With the rwlock
from our PR php#2349 alone, crashes reduced to every 6-12 hours (worker
threads not covered). Combined fix expected to eliminate both paths.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant