Skip to content

[core] Support manifest sort feature when commit#7842

Open
discivigour wants to merge 47 commits into
apache:masterfrom
discivigour:j/manifest5
Open

[core] Support manifest sort feature when commit#7842
discivigour wants to merge 47 commits into
apache:masterfrom
discivigour:j/manifest5

Conversation

@discivigour
Copy link
Copy Markdown
Contributor

@discivigour discivigour commented May 13, 2026

Purpose

  • add manifest-sort.enabled to enable manifest sort

Tests

  • ManifestFileMetaTest.testManifestSortWithOverlappingPartitions()

/**
* Compares the value at field {@code k} of two {@link BinaryRow}s according to {@code type}.
*/
static int compareField(BinaryRow a, BinaryRow b, int k, DataType type) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use CodeGenUtils.newRecordComparator?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice point.

}
}

if (!addedToExisting) {
Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use boolean addedToExisting.

Just
List earliestRun = runs.pool();
if (earliestRun == null) {
do something
} else if (compare(xxx) > 0) {
do something
} else {
do something
}

It makes this more pretty

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

last.partitionStats().maxValues(),
sortFieldIndex,
sortFieldType)
>= 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is overlap in one run if "equals".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I designed it this way to ensure that the minimum number of Sorted runs is built to reduce the burden of sorting.

umi added 26 commits May 19, 2026 13:21
batch

externalSort

fix

add manifest sort to compact job

addTest

review

mvMorax

fix

spi

proto

proto

fix

fix

# Conflicts:
#	paimon-core/src/main/java/org/apache/paimon/operation/ManifestFileMerger.java
fix
# Conflicts:
#	paimon-core/src/test/java/org/apache/paimon/schema/SchemaValidationTest.java

# Conflicts:
#	paimon-core/src/test/java/org/apache/paimon/schema/SchemaValidationTest.java
Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I took another pass over the latest revision, and I think there are still a few issues that should be fixed before merging.

  1. The boundary condition for interval overlap still looks wrong.

    In ManifestFileSorter.buildLevelSortedRuns, a file is appended to an existing run when file.min >= last.max. In splitIntoSections, a new section is also started when file.min >= sectionMaxBound.

    However, partition stats represent closed intervals. For example, [1, 3] and [3, 5] still overlap at partition value 3. A sorted run is documented as containing non-overlapping intervals, so this case should not be placed into the same run. Similarly, sections are used as overlap-connected rewrite units, so this case should not be split into different sections either. I think both checks should use > 0, not >= 0, and we should add a test for the max == min boundary case.

  2. manifest-sort.enabled currently bypasses the original manifest compaction trigger/gate.

    ManifestFileMerger.merge directly enters ManifestFileSorter.trySortRewrite and returns from that path when manifest sort is enabled. This means the original manifest.full-compaction-threshold-size and manifest.merge-min-count behavior is no longer applied in the same way. Inside classifyManifests, files are classified only by fileSize < targetSize or delete-range overlap, so small manifests / delete manifests can trigger sort rewrite more aggressively than the existing merge logic.

    If this is intentional, I think the new semantics should be documented very clearly. Otherwise, the sort path should preserve the existing full/minor compaction gates, especially manifest.merge-min-count for minor manifest merging.

  3. The partial rewrite path for manifest-sort.max-rewrite-size can break the output order.

    When the first section exceeds the rewrite budget, rewriteSections splits it into rewriteFiles and remainingFiles, rewrites the first part, and appends the remaining section to the end of the sections list. If there are later sections with larger key ranges, the remaining part of the current section will be emitted after them, which can produce an order like 0..10, 20..30, 10..20.

    To keep the manifest list sorted, I think we should either skip the whole section once the budget is exceeded, or keep the remaining section at the current position/order instead of appending it to the tail. This also needs a regression test.

  4. Test coverage is still missing some important edge cases.

    The new tests cover large overlapping ranges and delete elimination, but I do not see coverage for boundary-touching intervals (max == min), manifest.merge-min-count / full threshold behavior under manifest-sort.enabled, manifest-sort.max-rewrite-size preserving global output order, or null partition values. The switch to RecordComparator is a good improvement, but a null partition test would make this safer.

@discivigour
Copy link
Copy Markdown
Contributor Author

discivigour commented May 19, 2026

Thanks for the update. I took another pass over the latest revision, and I think there are still a few issues that should be fixed before merging.

  1. The boundary condition for interval overlap still looks wrong.
    In ManifestFileSorter.buildLevelSortedRuns, a file is appended to an existing run when file.min >= last.max. In splitIntoSections, a new section is also started when file.min >= sectionMaxBound.
    However, partition stats represent closed intervals. For example, [1, 3] and [3, 5] still overlap at partition value 3. A sorted run is documented as containing non-overlapping intervals, so this case should not be placed into the same run. Similarly, sections are used as overlap-connected rewrite units, so this case should not be split into different sections either. I think both checks should use > 0, not >= 0, and we should add a test for the max == min boundary case.
  2. manifest-sort.enabled currently bypasses the original manifest compaction trigger/gate.
    ManifestFileMerger.merge directly enters ManifestFileSorter.trySortRewrite and returns from that path when manifest sort is enabled. This means the original manifest.full-compaction-threshold-size and manifest.merge-min-count behavior is no longer applied in the same way. Inside classifyManifests, files are classified only by fileSize < targetSize or delete-range overlap, so small manifests / delete manifests can trigger sort rewrite more aggressively than the existing merge logic.
    If this is intentional, I think the new semantics should be documented very clearly. Otherwise, the sort path should preserve the existing full/minor compaction gates, especially manifest.merge-min-count for minor manifest merging.
  3. The partial rewrite path for manifest-sort.max-rewrite-size can break the output order.
    When the first section exceeds the rewrite budget, rewriteSections splits it into rewriteFiles and remainingFiles, rewrites the first part, and appends the remaining section to the end of the sections list. If there are later sections with larger key ranges, the remaining part of the current section will be emitted after them, which can produce an order like 0..10, 20..30, 10..20.
    To keep the manifest list sorted, I think we should either skip the whole section once the budget is exceeded, or keep the remaining section at the current position/order instead of appending it to the tail. This also needs a regression test.
  4. Test coverage is still missing some important edge cases.
    The new tests cover large overlapping ranges and delete elimination, but I do not see coverage for boundary-touching intervals (max == min), manifest.merge-min-count / full threshold behavior under manifest-sort.enabled, manifest-sort.max-rewrite-size preserving global output order, or null partition values. The switch to RecordComparator is a good improvement, but a null partition test would make this safer.

Thanks for your comment.

  1. I designed it this way to ensure that the minimum number of Sorted runs is built to reduce the burden of sorting.
  2. I have introduced manifestFullCompactionThresholdSize to to reduce the phenomenon of "one delete entry causing large-scale file rewriting.
  3. I don‘t think it is a problem.
  4. I will add more tests.

@discivigour discivigour marked this pull request as ready for review May 19, 2026 10:15
Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Please check the two questions:
1、If sorted run has files overlap, is it correct?
2、If delete comes, how to deal with it.
Nothing else to me.

@discivigour
Copy link
Copy Markdown
Contributor Author

LGTM. Please check the two questions: 1、If sorted run has files overlap, is it correct? 2、If delete comes, how to deal with it. Nothing else to me.

  1. If the file endpoints in SortedRun overlap, it does not affect the subsequent reconstruction of the LSM Tree. In addition, when users filter through partitions, they will only read 1-2 more files.
  2. If a delete entry comes, it will first determine whether achieve manifestFullCompactionThresholdSize.If achieved, will eliminate all the delete; If not reach, the meta related to the delete entry partition will be retained and not participate in the sorting to prevent the order of add and delete from being disrupted

totalDeltaFileSize += file.fileSize();
}
}
boolean removeAllDelete = totalDeltaFileSize >= sizeTrigger;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to triggerFullCompact

Map<ManifestFileMeta, Boolean> defaultCompactionManifests = new LinkedHashMap<>();
List<ManifestFileMeta> lsmFiles = new LinkedList<>(input);
Set<FileEntry.Identifier> deleteEntries =
FileEntry.readDeletedEntries(manifestFile, input, manifestReadParallelism);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why read delete every time? If not full compaction, we still need to read all the deletes?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split full compaction and minor compaction, don't make it mixed. Refer to ManifestFileMerge.merge

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will split full compaction and minor compaction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants