Optimizing Variant read path with lazy caching#3481
Optimizing Variant read path with lazy caching#3481nssalian wants to merge 2 commits intoapache:masterfrom
Conversation
steveloughran
left a comment
There was a problem hiding this comment.
code looks good; made some minor changes.
This should make a very big difference when selectively retrieving multiple fields within a single variant, or within a variant and nested children.
I do worry about concurrency now. The existing Variant didn't have issues here precisely because it recalculated everything.
We have to be confident that even if concurrent access triggers a duplicate cache operation, there's no harm in this. Otherwise cache access will have to be synchronized.
It all looks good to me.
|
I started the workflows |
Co-authored-by: Steve Loughran <stevel@cloudera.com>
6c6db2e to
6f540f4
Compare
steveloughran
left a comment
There was a problem hiding this comment.
reviewing the code again to see if it's possible to get back some of those performance numbers lost with the move to volatile.
We're only reading and caching data, so there's no real write conflicts -is the use of volatile everywhere being over-cautious? it's forcing memory reads everywhere.
And I don't know how common cross-thread reading will actually be in production systems; in spark each worker is its own thread, after all.
Maybe the goal should just be all reviewers being confident that if there are dual writers, the output will always be consistent.
| cache = new String[dictSize]; | ||
| metadataCache = cache; | ||
| } | ||
| if (cache[id] == null) { |
There was a problem hiding this comment.
set cache to be metadataCache here, so if there is any race condition the same cache array is updated. If two threads are writing to the same [id], well, one lookup is wasted. The joint cache is still (probably) updated with each value
There was a problem hiding this comment.
Fixed this. Now it re-reads metadataCache after assignment so concurrent threads converge on the same array.
Also, removed the volatile fields. I ran some local concurrent tests and with multiple iterations and confirmed. I don't see any concurrency tests in the codebase but happy to add it to the test suite if it helps build confidence in the approach here.
|
I would propose a javadoc on concurrency here. Till now we've had a mutable builder, the caching makes it mutable. But even in a race condition, as the only change whjch ever takes place is a decode an update of the dictionary, if the dictionary was safe then the worst outcome is one of the lazy evals gets lost. Lets Java HashMap is not thread safe, the very old HashTable is, but it may have its own penalty when used. That rust variant puts a lot of effort into memory efficiency too. Maybe we should make sure that these changes don't completely explode memory consumption. I know, it's a tradeoff (speed, space, code complexity). And queries like speed. And I think the focus should be "single thread speed and no inconsistency on multithreaded use" as single thread workers is what the query engines do. After all, they shouldn't expect the input streams to be thread-safe, should they? Two threads doing parallel reads of a stream is already making some big assumptions about the underlying layers (*) (*) hadoop input streams are thread safe precisely because code makes those assumptions, FWIW. Going thread unsafe broke hbase |
Rationale for this change
Profiling in 3452 identified
Variant.getFieldAtIndex()and metadata string lookups as hotspots during variant reads. Every call togetFieldByKey,getFieldAtIndex, andgetElementAtIndexre-parses headers and re-allocates objects that could be cached.What changes are included in this PR?
Adds lazy caching to
Variant.javafor metadata strings, object headers, and array headers. Field lookups ingetFieldByKeynow defer value construction until a match is found, and child Variants share the parent's metadata cache. Also removes two unused static helper methods.Includes @steveloughran's string converter optimization from #3452:
VariantBuilder.appendAsString(Binary)and its use inVariantConverters.Are these changes tested?
Ran the benchmarks from 3452 locally
Before:
1st run:
2nd run after initial thread safety changes:
3rd run after trimming down to setting volatile on the
metadataCacheAre there any user-facing changes?
No.