[fix](streaming-job) propagate ALTER source/target properties to runtime and persistence by JNSimba · Pull Request #62553 · apache/doris

JNSimba · 2026-04-16T07:59:22Z

What problem does this PR solve?

ALTER JOB on a non-TVF streaming CDC job could silently fail in two ways:

Persistence: replayOnUpdated didn't sync sourceProperties / targetProperties, so the values were lost after FE checkpoint/restart or on follower FE.
Runtime: JdbcSourceOffsetProvider held an independent copy of sourceProperties seeded at init time. ALTER on credentials (password, driver_url, etc.) never reached the provider's BE RPCs — the job would keep trying the old credentials.

Fix

replayOnUpdated now copies sourceProperties / targetProperties from the replay job.
Job caches a convertedSourceProperties (transient) that's refreshed on ALTER; all provider and task entry points (fetchMeta / cleanup / createStreamingMultiTblTask) push it to the provider via a new ensureInitialized override.

Tests

test_streaming_mysql_job_alter_props_restart_fe.groovy — ALTER props survive FE restart.
test_streaming_mysql_job_alter_cred.groovy — ALTER password with wrong credentials pauses the job; ALTER back recovers without FE restart.

Release note

Fix streaming job ALTER JOB source/target properties not taking effect at runtime and being lost after FE checkpoint/restart.

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test
- No need to test
Behavior changed:
- No.
Does this need documentation?
- No.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2026-04-16T07:59:29Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

JNSimba · 2026-04-16T08:00:49Z

run buildall

hello-stephen · 2026-04-16T09:25:39Z

FE UT Coverage Report

Increment line coverage 0.00% (0/4) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-04-16T10:19:07Z

FE Regression Coverage Report

Increment line coverage 0.00% (0/664) 🎉
Increment coverage report
Complete coverage report

JNSimba · 2026-04-20T02:54:26Z

/review

github-actions

Findings

Blocking: StreamingInsertJob.replayOnUpdated() now copies the top-level sourceProperties, but it still leaves the already-instantiated JdbcSourceOffsetProvider with its old cached sourceProperties / snapshotParallelism. JobManager.replayUpdateJob() replays onto the existing in-memory job object, so after follower promotion the source-side RPCs (fetchRemoteMeta, compareOffset, requestTableSplits, initSourceReader) still run with stale JDBC config. ALTERing credentials or snapshot parallelism is therefore still lost on the real CDC path even though SHOW JOBS now prints the new map.

Critical Checkpoints

Goal of current task: Partially met. The PR fixes persisted ExecuteSql / top-level property maps, but it does not fully restore altered source-side behavior after replay/failover. The added regression only proves the ExecuteSql representation, not the source RPC path.
Change size/focus: Small and focused, but too small to cover the cached provider state that replay also needs to refresh.
Concurrency: No new concurrent path is introduced here.
Lifecycle / persistence: Not fully correct. Replay is still not equivalent to the master-side state transition because one cached runtime object keeps pre-ALTER source config.
Parallel code paths: The TVF path already refreshes ALTER-able source fields in JdbcTvfSourceOffsetProvider.ensureInitialized(), but the non-TVF JDBC path still does not.
Test coverage: Incomplete for source-property replay. The regression alters snapshot_split_size only after the snapshot phase has already finished and then mostly asserts ExecuteSql, so it would not catch the stale-provider failover path described above.
Observability / performance: No additional concerns noted beyond the correctness issue above.

Requesting changes until the replay path refreshes the JDBC provider state as well.

JNSimba · 2026-04-20T09:12:28Z

/review

github-actions

I found 2 blocking issues and 1 minor test issue.

Critical checkpoints:

Goal of current task: Partially achieved. Persistence replay of sourceProperties/targetProperties is fixed, but runtime ALTER propagation is still incomplete because the BE cdc_client keeps reader/deserializer state cached across tasks. The added tests do not cover that path.
Modification size/focus: Reasonably focused, but the runtime fix stops at the FE/provider layer and misses the BE reader lifecycle.
Concurrency: No new lock-order or heavy-under-lock problem stood out in the changed paths.
Lifecycle/state management: Not fully correct. ALTER JOB can leave sourceProperties partially updated on FE when cert conversion fails, and the BE SourceReader keeps old source-property-derived caches across resume.
Configuration items: No new configs.
Compatibility: No incompatible protocol or storage-format change identified.
Parallel code paths: The non-TVF runtime path is only partially updated; some allowed source-property ALTERs still do not take effect.
Special conditional checks: No additional concern beyond the non-atomic source-property update.
Test coverage: The new regression tests cover restart persistence and password rotation, but they miss reader-cached source props (for example table.<tbl>.exclude_columns) and failed-ALTER atomicity. One new assertion also compares task counts lexicographically.
Test result files: The new .out file looks consistent.
Observability: Existing logs look sufficient for these paths.
Transaction/persistence: The replay change for sourceProperties/targetProperties looks correct.
Data writes/modifications: Runtime write behavior can still use stale source-property-derived caches on the BE side.
FE-BE variable passing: FE now sends refreshed props, but the cached BE reader does not fully re-consume them after first initialization.
Performance: snapshot_parallelism ALTER is still not fully honored at runtime because the cached BE reader keeps the old poll executor size.

Please address the inline comments before merge.

JNSimba · 2026-04-20T10:40:12Z

/review

JNSimba · 2026-04-20T10:40:24Z

run buildall

github-actions

I found 2 blocking issues.

StreamingInsertJob.replayOnUpdated() now persists the updated sourceProperties, but replay still does not invalidate the transient convertedSourceProperties cache. On any FE that has already initialized this job once (for example a former master that later replays this ALTER as a follower), later promotion without restart will still drive fetchMeta() and createStreamingMultiTblTask() with the pre-ALTER converted map.
ALTER JOB ... FROM MYSQL ("snapshot_split_size" = ...) is still a silent runtime no-op. Snapshot splits are materialized once at job creation and then restored from streaming_job_meta; resume/restart never rebuilds them from the altered value, so the new property only changes the displayed/persisted metadata.

Critical checkpoint conclusions:

Goal / correctness: The PR fixes restart persistence of sourceProperties / targetProperties and improves runtime propagation on the active FE, but it still misses replay/failover cache invalidation and still accepts at least one source-property ALTER that does not affect runtime.
Scope / minimality: The change is focused, but the new FE-side rejection list is still incomplete for source properties that cannot actually take effect on an existing job.
Concurrency: I did not find a new lock-order or blocking-under-lock issue in the touched FE path; the main remaining risk is replay/lifecycle state rather than thread safety.
Lifecycle / persistence: EditLog write/replay coverage exists for the new maps, but replay is not yet equivalent to master-side runtime state because the transient converted-source cache is left stale.
Parallel paths: The non-TVF runtime path is only partially covered; the precomputed snapshot-split path still ignores some ALTERed source properties.
Tests: The new regression tests cover FE restart persistence and credential rotation, but there is still no coverage for replay on an already-initialized FE object / failover, and the restart test currently uses a source property whose runtime semantics are still unchanged.
Observability: Existing logs are sufficient for the touched path; I do not see a new metrics gap here.

I did not run the full FE / regression test suite in this review environment.

hello-stephen · 2026-04-20T11:58:25Z

FE UT Coverage Report

Increment line coverage 0.00% (0/47) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-04-20T13:09:09Z

FE Regression Coverage Report

Increment line coverage 10.20% (5/49) 🎉
Increment coverage report
Complete coverage report

JNSimba · 2026-04-21T02:20:24Z

/review

JNSimba · 2026-04-21T02:20:33Z

run buildall

github-actions

Findings

Blocking: the new "reject silent no-op source keys" validation only covers non-TVF FROM MYSQL/POSTGRES ... TO DATABASE jobs. The parallel cdc_stream(...) SQL path still allows keys such as snapshot_parallelism, snapshot_split_size, and table.<tbl>.exclude_columns, even though those values are also cached only at BE SourceReader / deserializer initialization time and are not refreshed by SQL ALTER. Because the job-driven TVF path reuses the same BE reader in fs_brokers/cdc_client/common/Env.jobContexts and fetchRecordStream() does not close it for numeric job IDs, these ALTERs remain silent no-ops.
Medium: replayOnUpdated() now invalidates baseCommand / originTvfProps, but it still leaves the existing encryptedSql cache untouched. Since JobManager.replayUpdateJob() mutates the existing in-memory job object instead of replacing it, a follower/new master can keep showing the pre-ALTER SQL in SHOW JOBS even though executeSql has been replayed.

Critical Checkpoints

Goal of current task: Partially met. The PR fixes non-TVF source/target property replay and the runtime credential-refresh path, but it still leaves a parallel TVF ALTER path where first-initialize-only source properties are accepted yet ineffective.
Change size/focus: Small and focused.
Concurrency: No new lock-order or deadlock issue found in the modified flow.
Lifecycle / persistence: Not fully equivalent on replay because the SQL-derived encryptedSql cache is not refreshed with the replayed executeSql.
Parallel code paths: Not fully covered. The new no-op-property rejection is implemented only for the non-TVF path; the cdc_stream SQL path still misses the same protection.
Test coverage: Improved for non-TVF runtime/persistence, but still missing coverage for SQL/TVF ALTER of first-initialize-only properties and for follower/failover SQL display after replay.
Observability / performance: No additional concerns beyond the correctness items above.

github-actions · 2026-04-21T03:22:54Z

        }
+
+        // Reject keys that the runtime reads only at first initialize and never refreshes,
+        // so ALTER would be a silent no-op. See JdbcSourceOffsetProvider / DebeziumJsonDeserializer.


This new guard only covers the non-TVF FROM ... TO DATABASE path. The parallel cdc_stream(...) SQL path in checkUnmodifiableProperties() still allows snapshot_parallelism, snapshot_split_size, and table.<tbl>.exclude_columns, even though those are also read only during BE reader/deserializer initialization. The job-driven TVF path reuses the same SourceReader in fs_brokers/cdc_client/common/Env.jobContexts, and fetchRecordStream() does not close it for numeric job IDs, so a SQL ALTER of those keys is still a silent no-op there.

github-actions · 2026-04-21T03:22:54Z

+        }
        setExecuteSql(replayJob.getExecuteSql());
+        // SQL-derived caches must be invalidated together so the next parse uses the replayed SQL.
+        this.baseCommand = null;


baseCommand and originTvfProps are invalidated here, but encryptedSql is another SQL-derived cache on the current in-memory job. Because JobManager.replayUpdateJob() mutates the existing object instead of replacing it, leaving encryptedSql untouched means a follower/new master can keep returning the pre-ALTER SQL from getShowSQL() after replay/failover. This cache should be cleared or copied from replayJob together with executeSql.

JNSimba · 2026-04-21T04:35:27Z

run external

JNSimba · 2026-04-21T04:35:46Z

run vault_p0

JNSimba · 2026-04-21T06:13:02Z

run external

…sistence after FE restart Regression test to verify that sourceProperties and targetProperties altered via ALTER JOB survive FE restart. Expected to fail until StreamingInsertJob.replayOnUpdated() is fixed to sync both property maps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…playOnUpdated After ALTER JOB changes source/target properties, replayOnUpdated() did not propagate sprops/tprops into the replaying job. The checkpoint thread replays journals on its own Env, so the generated image also lost the ALTERed values — any subsequent FE restart (from image or EditLog) would revert to the original CREATE-time properties. For CDC jobs this can cause follower failover to use stale JDBC credentials or miss updated load options such as max_filter_ratio. Also tighten the regression test's ALTER JOB to only specify the changed properties. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ffsetProvider Non-TVF JDBC jobs kept an independent sourceProperties copy on the offset provider seeded by initSourceJob, so ALTER JOB credential changes (e.g. password rotation) never reached the provider's BE RPCs. Cache the converted form of sourceProperties on the job and push it to the provider via ensureInitialized on every fetchMeta / cleanup cycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rived caches on replay - AlterJobCommand: reject ALTER on snapshot_split_size (only read by the initial splitChunks pass; subsequent restart paths restore persisted splits via replayIfNeed and never re-split, so the ALTER would only change the persisted metadata, not the running split layout). - StreamingInsertJob.replayOnUpdated: invalidate the transient convertedSourceProperties, baseCommand, and originTvfProps caches so that a formerly-promoted FE that is replaying this ALTER as a follower does not keep returning the pre-ALTER values if it is promoted back without restart. - Regression: swap test_streaming_mysql_job_alter_props_restart_fe from snapshot_split_size to altering `user`; ALTER the user back to root before RESUME so the job can authenticate upstream MySQL post-restart. - Regression: add snapshot_split_size reject case in test_streaming_mysql_job_create_alter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

JNSimba · 2026-04-22T02:49:32Z

run buildall

JNSimba · 2026-04-22T04:20:42Z

run external

hello-stephen · 2026-04-22T05:59:29Z

FE Regression Coverage Report

Increment line coverage 11.48% (7/61) 🎉
Increment coverage report
Complete coverage report

JNSimba added dev/4.0.x dev/4.1.x labels Apr 20, 2026

github-actions Bot requested changes Apr 20, 2026

View reviewed changes

Comment thread ...-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java

JNSimba changed the title ~~[fix](streaming-job) sync sourceProperties and targetProperties in replayOnUpdated~~ [fix](streaming-job) propagate ALTER source/target properties to runtime and persistence Apr 20, 2026

github-actions Bot requested changes Apr 20, 2026

View reviewed changes

JNSimba removed the dev/4.0.x label Apr 20, 2026

github-actions Bot requested changes Apr 20, 2026

View reviewed changes

Comment thread ...-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java

Comment thread fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/AlterJobCommand.java

github-actions Bot requested changes Apr 21, 2026

View reviewed changes

JNSimba and others added 5 commits April 22, 2026 10:49

fix

add4a43

JNSimba force-pushed the fix/source_target_props_restart branch from 7438a81 to ba3c76d Compare April 22, 2026 02:49

Conversation

JNSimba commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Fix

Tests

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Apr 16, 2026

Uh oh!

JNSimba commented Apr 16, 2026

Uh oh!

hello-stephen commented Apr 16, 2026

FE UT Coverage Report

Uh oh!

hello-stephen commented Apr 16, 2026

FE Regression Coverage Report

Uh oh!

JNSimba commented Apr 20, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Findings

Critical Checkpoints

Uh oh!

Uh oh!

JNSimba commented Apr 20, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JNSimba commented Apr 20, 2026

Uh oh!

JNSimba commented Apr 20, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hello-stephen commented Apr 20, 2026

FE UT Coverage Report

Uh oh!

hello-stephen commented Apr 20, 2026

FE Regression Coverage Report

Uh oh!

JNSimba commented Apr 21, 2026

Uh oh!

JNSimba commented Apr 21, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Findings

Critical Checkpoints

Uh oh!

github-actions Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

JNSimba commented Apr 21, 2026

Uh oh!

JNSimba commented Apr 21, 2026

Uh oh!

JNSimba commented Apr 21, 2026

Uh oh!

JNSimba commented Apr 22, 2026

Uh oh!

JNSimba commented Apr 22, 2026

Uh oh!

hello-stephen commented Apr 22, 2026

FE Regression Coverage Report

Uh oh!

Reviewers

JNSimba commented Apr 16, 2026 •

edited

Loading