build: forbid C++ sources outside src/export/#68
Conversation
There was a problem hiding this comment.
Pull request overview
This PR ports the PostgreSQL hook/queue path from C++-style code to pure C to avoid RAII-driven heap allocation failure modes, and extends exported telemetry to include parent_query_id for nested query attribution.
Changes:
- Converted hook/queue/normalization codepaths to C (remove
extern "C"blocks in.c, replacenullptr/static_cast/std::arrayusage). - Added per-query stack tracking to compute
parent_query_idand improve nested query CPU attribution. - Updated ClickHouse schema + added migration for the new
parent_query_idcolumn; updated build to compile.csources.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/queue/shmem.h | Adds C-friendly forward declaration for shared state struct. |
| src/queue/shmem.c | Ports implementation to C constructs (NULL, C casts, void params). |
| src/queue/local_batch.c | Replaces std::array with fixed C array; C-style callback signatures. |
| src/queue/event.h | Makes enums/structs more C-friendly; adds parent_query_id field. |
| src/pg_stat_ch.c | Removes C++ linkage wrappers and ports nullptr → NULL. |
| src/hooks/query_normalize_state.h | Ports state types/APIs to C typedefs and value-passing. |
| src/hooks/query_normalize_state.c | Replaces absl hashing with Postgres hash_bytes; C cleanups. |
| src/hooks/query_normalize.c | Ports C++ casts/nullptr checks to C equivalents. |
| src/hooks/hooks.c | Introduces fixed-size query frame stack; computes parent_query_id; C ports. |
| src/export/stats_exporter.cc | Exports new parent_query_id column to ClickHouse/OTel exporters. |
| src/config/guc.c | Replaces std::array enum entries with a C array; ports NULL usage. |
| migrations/001_add_parent_query_id.sql | Adds ClickHouse migration for parent_query_id. |
| docker/init/00-schema.sql | Updates canonical ClickHouse schema to include parent_query_id. |
| CMakeLists.txt | Ensures .c sources are compiled and linked into the shared library. |
Comments suppressed due to low confidence (5)
src/hooks/hooks.c:57
- The comment says the fixed-size stack allows "64 levels" of nesting, but PSCH_MAX_QUERY_NESTING_DEPTH is currently 8. Please reconcile the comment and the constant (either increase the constant or update the documented limit).
src/queue/shmem.c:149 - The elog() format string uses %lu for values returned by pg_atomic_read_u64(). On platforms where uint64 is not unsigned long (e.g., 32-bit builds), this is undefined behavior and can print incorrect values. Prefer PostgreSQL's format macros (e.g., UINT64_FORMAT) or cast to an explicitly matching type before formatting.
src/hooks/hooks.c:807 - CALL_PROCESS_UTILITY() can raise ERROR (longjmp). Since a frame is pushed before this call, an ERROR path will skip QueryStackPop(), leaving query_stack_depth permanently incremented and corrupting parent_query_id/CPU attribution for subsequent statements. Wrap the call in PG_TRY/PG_FINALLY (or similar) and ensure the stack is unwound in the FINALLY path, even on ERROR/ereport.
src/hooks/hooks.c:602 - QueryStackPush() happens before chaining to prev/standard ExecutorStart. If that downstream call ereports(ERROR), this function will longjmp without unwinding the stack, leaving query_stack_depth corrupted for the rest of the session. Consider pushing only after a successful ExecutorStart call, or wrapping the ExecutorStart call in PG_TRY/PG_CATCH and unwinding the stack in the error path before rethrowing.
src/hooks/hooks.c:667 - QueryStackPop() (and its internal getrusage()) runs before checking !psch_enabled, so disabling pg_stat_ch still pays the per-query getrusage cost. If pg_stat_ch is disabled, consider unwinding only the depth counter (no timestamp/rusage capture) or gating QueryStackPush/Pop's expensive work on psch_enabled while still keeping the stack balanced.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| col_username->Append(std::string(ev.username, ev.username_len)); | ||
| col_pid->Append(ev.pid); | ||
| col_query_id->Append(static_cast<int64_t>(ev.queryid)); | ||
| col_parent_query_id->Append(static_cast<int64_t>(ev.parent_query_id)); |
There was a problem hiding this comment.
parent_query_id is exported as a UInt64 column, but it's being cast to int64_t before appending. This will corrupt values when parent_query_id exceeds INT64_MAX. Append as uint64_t (or pass ev.parent_query_id directly) to preserve the full range.
| col_parent_query_id->Append(static_cast<int64_t>(ev.parent_query_id)); | |
| col_parent_query_id->Append(ev.parent_query_id); |
| -- Introduced in pg_stat_ch 0.4.x. Each event now carries the query_id of its | ||
| -- calling query (e.g. the plpgsql function that issued an SPI statement). |
There was a problem hiding this comment.
The migration header says this is introduced in pg_stat_ch 0.4.x, but the extension control file currently advertises default_version = 0.1. Please align the referenced version (or describe it without a specific version) to avoid confusing upgrade guidance.
| -- Introduced in pg_stat_ch 0.4.x. Each event now carries the query_id of its | |
| -- calling query (e.g. the plpgsql function that issued an SPI statement). | |
| -- Introduces parent_query_id support. Each event now carries the query_id of | |
| -- its calling query (e.g. the plpgsql function that issued an SPI statement). |
| // Event structure stored in shared memory queue (~2KB fixed size) | ||
| // Forward-declare typedef so C code can use PschEvent without the struct tag | ||
| typedef struct PschEvent PschEvent; |
There was a problem hiding this comment.
The comment says the shared-memory event is "~2KB", but with PSCH_MAX_ERR_MSG_LEN (2048) + PSCH_MAX_QUERY_LEN (2048) plus headers the struct is ~4.5KB. Updating this helps keep sizing/backpressure guidance accurate.
| size_t src_len; | ||
| size_t len; |
There was a problem hiding this comment.
don't need to do this (even tho postgres code often does..)
https://www.postgresql.org/docs/current/source-conventions.html C99 since PG12
|
has some stack code mixed up in here from #61 |
|
#88 isolates porting to C |
cc1991f to
ebeaeb9
Compare
fde97e5 to
eb7cc35
Compare
| "or port it to C. Offending file(s):\n ${PG_STAT_CH_STRAY_CXX}") | ||
| endif() | ||
|
|
||
| file(GLOB_RECURSE PG_STAT_CH_EXPORTER_SOURCES src/export/*.cc) |
There was a problem hiding this comment.
I would need a new word to capture how supremely OK I am with this
| run: | | ||
| sudo apt-get install -y \ | ||
| postgresql-18 postgresql-server-dev-18 \ | ||
| cmake ninja-build clang libssl-dev pkg-config curl zip unzip tar |
There was a problem hiding this comment.
Let's fix this if the CI ever like actually fails
eb7cc35 to
5566554
Compare
5566554 to
0cb3ee0
Compare
| file(GLOB_RECURSE PG_STAT_CH_PLUGIN_SOURCES src/*.c) | ||
|
|
||
| file(GLOB_RECURSE PG_STAT_CH_STRAY_CXX | ||
| src/*.cc src/*.cxx src/*.cpp | ||
| ) |
0cb3ee0 to
3a6176c
Compare
3a6176c to
369b69d
Compare
The reason #88 ported the plugin layer to C is to keep the PostgreSQL ↔ extension boundary in pure C, so C++ exception unwinding can never cross PG's longjmp-driven error frames. Nothing in the build system enforced that split: a stray `src/hooks/helper.cc` would happily get globbed into the shared library, reintroducing RAII, libstdc++, and the whole class of UB #88 just eliminated. Two enforcements: 1. File-extension lock. Replace the catch-all `file(GLOB_RECURSE SOURCES src/*.cc src/*.c)` with two separate globs (plugin = `src/*.c`, exporter = `src/export/*.cc`) plus a configure-time check that fails with a named, actionable error if any `.cc` / `.cxx` / `.cpp` appears under `src/` outside `src/export/`. 2. No load-time / exit-time work in namespace-scope globals (Clang). Apply -Wglobal-constructors -Werror=global-constructors to the exporter sources. Catches the "static std::thread g_worker = std::thread(work);" / "static std::map<...> g_cache;" hazard where a global allocates / starts threads / opens connections inside the postmaster before _PG_init is even reached. Clang's diagnostic covers both halves at namespace scope: load-time constructors AND exit-time destructors. The second lock is Clang-only — GCC has no equivalent flag. Bracket the option with a compiler-ID check so the GCC matrix stays green, and add a small parallel CI job (`Build (Clang, PG 18)`) so regressions are caught at PR time on the Clang path. One existing finding to clear: `g_exporter` in stats_exporter.cc has a non-trivial ~unique_ptr at namespace scope, which Clang flags as an exit-time destructor. The destructor is actually a no-op in every code path — PschExporterShutdown runs in on_proc_exit and resets the unique_ptr before the C++ atexit chain ever fires. Annotate with `[[clang::no_destroy]]` (gated behind __has_cpp_attribute for GCC compatibility) so the suppression is explicit and the warning compiles clean. -Wexit-time-destructors (the sibling warning) is intentionally NOT enabled. At namespace scope, -Wglobal-constructors already covers what matters. -Wexit-time-destructors additionally fires on function-local statics — the two cached Arrow DataType shared_ptrs in arrow_batch.cc would trip it today. Easy to layer on later via a few more [[clang::no_destroy]] annotations if a regression motivates it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
369b69d to
68d74e5
Compare
| name: Build (Clang, PG 18) | ||
| runs-on: depot-ubuntu-22.04-16 | ||
| steps: | ||
| - uses: actions/checkout@v4 |
There was a problem hiding this comment.
v6 is latest, never let claude imagine what latest version of anything is
TL;DR
We can't separate the plugin build (C) entirely from the bgworker (C++); these build as one library, which forks postmaster.
Instead, modify the CMake build to
Also fix errors Clang tags but not GCC...
Moar text plz LLM Daddy
Builds on @serprex's #88 (C99 port of the plugin layer, now merged) by enforcing the C/C++ split in the build system itself. Single commit, two enforcements:
1. File-extension lock
Replaces the catch-all
file(GLOB_RECURSE SOURCES src/*.cc src/*.c)with two separate globs — plugin =src/*.c, exporter =src/export/*.cc— plus a configure-time check that fails the build if any.cc/.cxx/.cppfile appears undersrc/outsidesrc/export/. Same single shared library is produced — postmaster still loads one.so, the bgworker still inherits one address space — but a straysrc/hooks/helper.cc(or any future "just a tiny C++ helper") will now fail configure with a named, actionable error before any compilation runs.2. No load-time globals in the exporter
Even inside
src/export/, where C++ is allowed, forbid namespace-scope objects whose constructors run at.soload time. The hazard:static std::thread g_worker = std::thread(work);orstatic std::map<...> g_cache;— a global whose constructor runs inside postmaster before_PG_initeven fires, potentially allocating, starting threads, opening connections, taking locks, or otherwise touching state before PG is ready.-Wglobal-constructorsflags these;-Werror=global-constructorspromotes to build failure.Clang-only — GCC has no equivalent flag (requested upstream for years, never landed). Bracketed with
if(CMAKE_CXX_COMPILER_ID MATCHES "Clang")so the GCC matrix stays green. A small parallel CI job (Build (Clang, PG 18)) runs the Clang build so regressions are caught at PR time.Nothing in the tree trips the warning today —
g_exporter's constructor isconstexpr-initialized viastd::unique_ptr()'s constexpr default ctor; the rest ofsrc/export/namespace scope isconstexprPODs andusingaliases.What this does not do
-Wexit-time-destructors(the sibling warning that fires on non-trivial destructors running at exit) is intentionally NOT enabled. It would trip ong_exporter's~unique_ptrand two cached Arrowshared_ptr<DataType>statics inarrow_batch.cc. The fix is a handful of[[clang::no_destroy]]annotations — easy, but the user's stated primary concern is load-time globals (the "std::thread above main" case). Exit-time destructors are secondary (process is shutting down anyway). Easy to layer on later if a regression motivates it.Stacking
Targets
maindirectly (now that #88 has merged). #94 stacks on top of this for the exception barrier work.Test plan
Build (Clang, PG 18)CI job green — confirms-Werror=global-constructorscompiles clean against current code.mv src/hooks/hooks.c src/hooks/hooks.cc, runcmake -B build— should fail configure with the named error before any compile.static std::vector<int> g_test;at the top ofstats_exporter.cc, build with Clang — should fail with-Werror=global-constructors.Followup
The
parent_query_id/ nested-query-stack work that used to live on this branch is preserved onparent-query-id-preservedfor a separate PR after the safety stack (this + #94) lands.🤖 Generated with Claude Code