Skip to content

[core] Optimize Flink BTree index topology#7852

Draft
leaves12138 wants to merge 1 commit into
apache:masterfrom
leaves12138:codex/flink-btree-single-topology
Draft

[core] Optimize Flink BTree index topology#7852
leaves12138 wants to merge 1 commit into
apache:masterfrom
leaves12138:codex/flink-btree-single-topology

Conversation

@leaves12138
Copy link
Copy Markdown
Contributor

What changed

  • Reworked Flink BTree global index building to use one task-driven topology for all contiguous row ranges instead of building one topology per range.
  • Added an internal build task id to the sort key so each range keeps its own row-range metadata while sharing the same Flink source/read/sort/write chain.
  • Added coverage for parallelism calculation, many small ranges, and a single large range split across multiple writer subtasks.

Why

When row ranges are highly fragmented, the old implementation creates a separate Flink topology for each range. That can make the create-index procedure spend a long time constructing the JobGraph and can produce an oversized topology.

Validation

  • mvn -pl paimon-flink/paimon-flink-common -DfailIfNoTests=false -Dtest=BTreeIndexTopoBuilderTest test
  • mvn -pl paimon-flink/paimon-flink-common -Pfast-build -DfailIfNoTests=false -Dtest=BTreeGlobalIndexITCase#testBTreeIndexWithManyPartitions test
  • mvn -pl paimon-flink/paimon-flink-common -Pfast-build -DfailIfNoTests=false -Dtest=BTreeGlobalIndexITCase#testBTreeIndexWithSingleRangeAndParallelWriters test

@leaves12138 leaves12138 changed the title [codex] Optimize Flink BTree index topology [core] Optimize Flink BTree index topology May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant