[core] Support compact for chain table#7888
Conversation
JingsongLi
left a comment
There was a problem hiding this comment.
Review: [core] Support compact for chain table
Critical Bug: Resource lifecycle in mapPartitions closure
In ChainMergeProcedure.java, the BatchTableWrite and IOManager are created before the while (splitIterator.hasNext()) loop, but write.close() and ioManager.close() are called inside a finally block within the loop iteration. If a Spark partition receives more than one split, the second iteration will attempt to use already-closed resources and fail. Additionally, prepareCommit() is called per-split rather than once after all data has been written.
Fix: move the try/finally to wrap the entire while-loop, and call prepareCommit() only after all splits are processed.
Hardcoded version in pom.xml
The new dependency uses 1.5-SNAPSHOT instead of ${project.version}. This will break when the project version changes.
Typo: validataChainMerge should be validateChainMerge
Inconsistent Javadoc
The class-level doc shows CALL sys.compact(...) but the actual registered procedure name is chain_merge.
Incorrect PR title prefix
Title says [core] but all changes are in paimon-spark. Should be [spark].
Test coverage suggestion
The test should also assert that the un-merged partition in the snapshot branch remains unaffected, to guard against accidental overwrites.
Purpose
Linked issue: #7887
Purpose
Support compact for chain table
Tests
org.apache.paimon.spark.SparkChainTableITCase#testChainMerge