Skip to content

fix: support unhex on dictionary strings#4222

Open
yuboxx wants to merge 1 commit intoapache:mainfrom
yuboxx:fix-unhex-dictionary-477
Open

fix: support unhex on dictionary strings#4222
yuboxx wants to merge 1 commit intoapache:mainfrom
yuboxx:fix-unhex-dictionary-477

Conversation

@yuboxx
Copy link
Copy Markdown
Contributor

@yuboxx yuboxx commented May 5, 2026

What changes were proposed in this pull request?

This PR adds dictionary string support to the native Spark-compatible unhex expression. Dictionary arrays backed by Utf8 or LargeUtf8 values are unpacked and then handled by the existing string implementation.

It also expands coverage for unhex:

  • Rust unit coverage for dictionary arrays with valid hex, invalid hex, null keys, and null dictionary values
  • Spark expression coverage with dictionary enabled and disabled
  • Scalar literal coverage with constant folding disabled
  • SQL-file coverage under parquet.enable.dictionary=false,true

Closes #477.

How was this patch tested?

  • cargo test -p datafusion-comet-spark-expr math_funcs::unhex::test
  • ./mvnw -pl spark -Dsuites=org.apache.comet.CometExpressionSuite -Dtest=none -Pspark-4.0 -Pscala-2.13 test

@andygrove
Copy link
Copy Markdown
Member

Thanks for the contribution @yuboxx. Could you tell me more about the motivation for this PR? The changes looks reasonable, but do not address any bugs as far as I can tell. Dictionary-encoded arrays are unpacked in parquet_convert_array before any expressions can be evaluated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add dictionary support to unhex and test dictionary and scalar cases

2 participants