feat: use aligned slice access during bulk append in SparkUnsafeArray#4672
feat: use aligned slice access during bulk append in SparkUnsafeArray#4672sandugood wants to merge 12 commits into
Conversation
|
When running CI pipeline, got few UB. Switched to a less optimal way, which still provided boost in performance. By materializing
|
|
Could somebody potentially trigger the CI run? Thanks in advance |
|
Thanks for picking this up. The shape of the change is exactly what #4626 was asking for: keep the runtime alignment check as load-bearing, and replace the per-element null branch with A few things to look at, the first one is a correctness concern.
I'll approve the CI workflows so the matrix actually exercises the change. Out of curiosity, what was the UB you hit earlier? Worth knowing whether item 1 above was the cause or there's a separate hazard. Disclosure: this review was assisted by an AI tool (Anthropic Claude via Claude Code). I read the diff, the linked issue, and the prior closed PR's review feedback before forming the comments above, and I take responsibility for them. |
|
Thank you for the review @andygrove I have changed my approach and bechmarked it. Posting the benchmarking results, which I ran using
cc @mbutrovich, if you could take a quick look at this, please. Thanks! |
|
Thanks for the rework @sandugood, and for pulling the benchmark numbers and the One thing I want to call out as a nice catch: threading the timezone through and calling A few items before this is ready: 1. The boolean nullable path is UB, and this revision moved it the wrong way. This is item 1 from the earlier review. The new nullable branch does: let slice = unsafe { std::slice::from_raw_parts(self.element_offset as *const bool, num_elements) };
for (idx, &value) in slice.iter().enumerate() { ... }Rust requires every let values = unsafe { std::slice::from_raw_parts(self.element_offset as *const u8, num_elements) };
for &value in values { builder.append_value(value != 0); }Could you mirror that 2. Copy-paste leftover in 3. A couple of targeted tests would help. The bitmap flip and the partial-byte handling are easy to get subtly wrong. Would you be open to adding unit tests for nullable arrays where 4. Formatting. The 5. Minor. The byte reinterpretation of the null word assumes little-endian bit ordering. That matches Arrow and every platform we target, so it is fine, but a one-line comment next to the The boolean path is the only blocker. Everything else is small. Once that is a Disclosure: this review was assisted by an AI tool (Anthropic Claude via Claude Code). I read the diff, the linked issue, and the prior review thread before forming these comments, and I take responsibility for them. |
Which issue does this PR close?
Closes #4626
Rationale for this change
Currently, there is a bottleneck in performance during bulk append in the
SparkUnsafeArrayimplementation (inmacroand respectiveBuildertypes forbool,date32andtimestamp). If the array isNULLABLEthere is a hotspot:Self::is_null_in_bitset()which is suboptimal (see the benchmarking results below)What changes are included in this PR?
With this PR we change the flow of execution for nullable arrays:
SparkvsArrowlogicBooleanBufferfrom the bytes and then creating aNullBufferPrimitiveArray(with type specified in themacro_rules!and inside the according methods in theSparkUnsafeArrayfor several dtypes) from slice databuilderHow are these changes tested?
All of the basic tests pass.
Here is the benchmarking result (copied from the comment below).
Result from
with_nullsversion of benches (no_nullswasnt affected with this PR)