GH-3522: Reuse intermediate buffers in RunLengthBitPackingHybridDecoder PACKED path (~22% throughput on dictionary-id decode)#3523
Open
iemejia wants to merge 1 commit intoapache:masterfrom
Conversation
…dDecoder PACKED path Allocate the int[] values buffer and byte[] read-staging buffer once per decoder and grow them lazily, instead of allocating fresh arrays on every PACKED run. Resolves the existing "TODO: reuse a buffer" comment. A new currentBufferLength field tracks the logical length of the active region in packedValuesBuffer (which may now exceed the current run's size after a prior larger run grew it). Benchmark (RleDictionaryIndexDecodingBenchmark, 100k INT32, BIT_WIDTH=10, JMH -wi 5 -i 10 -f 2): Pattern | master ops/s | optimized ops/s | Improvement SEQUENTIAL | 93,061,521 | 113,856,860 | +22.3% RANDOM | 92,929,824 | 114,238,638 | +22.9% LOW_CARDINALITY | 92,813,229 | 115,271,347 | +24.2% End-to-end FileReadBenchmark sees ~2% improvement (RLE decoding is a small fraction of full file reads). Validation: 573 parquet-column tests pass. Built with -Dspotless.check.skip=true -Drat.skip=true -Djapicmp.skip=true.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #3522.
RunLengthBitPackingHybridDecoderallocates a newint[]andbyte[]on every PACKED run during decode. The code itself flagged this with a// TODO: reuse a buffercomment. This PR resolves the TODO by reusing the buffers across runs within the same decoder instance, growing them lazily only when a larger run is encountered.Also adds a
currentBufferLengthfield to track the logical active-region length inpackedValuesBuffer(sincepackedValuesBuffer.lengthmay now exceed the current run's size after a prior larger run grew it).Benchmark
RleDictionaryIndexDecodingBenchmark(added in #3512) isolates the RLE/bit-packed dictionary-id decode path. 100k INT32 dictionary IDs, BIT_WIDTH=10, JMH-wi 5 -i 10 -f 2(20 measurement iterations):End-to-end
FileReadBenchmarksees a much smaller ~2% improvement because RLE decoding is only one of many pipeline stages; the isolated micro-benchmark shows the true magnitude on the affected code path.Validation
parquet-column: 573 tests passTestRunLengthBitPackingHybridEncoder: 9 tests pass (these round-trip values through the decoder)-Dspotless.check.skip=true -Drat.skip=true -Djapicmp.skip=trueScope
17 LOC change to a single file. Self-contained and obviously correct (resolves the existing TODO).
Related
Part of the focused performance PR series from https://github.com/iemejia/parquet-perf. The companion ByteStreamSplit writer/reader changes from the same source commit (
ba52f82c3) have already been submitted as #3504 and #3506.How to reproduce
The benchmark is added in #3512. Once that lands, reproduce with: