Skip to content

PARQUET-3364: Allow hyphens in column names in AvroSchemaConverter#3524

Open
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/3364-cli-hyphen-column
Open

PARQUET-3364: Allow hyphens in column names in AvroSchemaConverter#3524
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/3364-cli-hyphen-column

Conversation

@yadavay-amzn
Copy link
Copy Markdown

Fixes #3364

Problem

parquet cat (and other CLI commands) reject valid Parquet files with column names containing hyphens (e.g. Creation-Time). The Parquet spec allows any UTF-8 string as a field name, but the AvroSchemaConverter fails because Avro's Schema.Field name validation only allows [A-Za-z_][A-Za-z0-9_]*.

Fix

Temporarily disable Avro name validation during Parquet-to-Avro schema conversion in AvroSchemaConverter.convert(). The field names are already valid per the Parquet spec — the restriction is purely an Avro naming convention that should not apply when reading Parquet files.

Testing

  • Added testHyphenatedColumnName test in TestAvroSchemaConverter
  • All 42 existing tests continue to pass

The Parquet spec allows any UTF-8 string as a field name, but the
AvroSchemaConverter was failing when converting Parquet schemas with
field names containing hyphens (e.g. "Creation-Time") to Avro schemas,
because Avro name validation rejects non-alphanumeric/underscore chars.

Fix: temporarily disable Avro name validation during Parquet-to-Avro
schema conversion, since the names are already valid per the Parquet spec.

Closes apache#3364
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[parquet-cli] Illegal character in column name

1 participant