feat: support pyarrow float16 by widening to float on read/write by anxkhn · Pull Request #3590 · apache/iceberg-python

anxkhn · 2026-06-30T18:13:24Z

Rationale for this change

PyArrow float16 (halffloat) currently raises UnsupportedPyArrowTypeException during schema conversion, because _ConvertToIceberg.primitive only handles float32 and float64:

>>> import pyarrow as pa
>>> from pyiceberg.io.pyarrow import _ConvertToIceberg, visit_pyarrow
>>> visit_pyarrow(pa.float16(), _ConvertToIceberg())
pyiceberg.exceptions.UnsupportedPyArrowTypeException: Column 'x' has an unsupported type: halffloat

Iceberg has no half-precision float, but float16 -> float32 is lossless: every IEEE 754 half value (including the maximum finite value 65504) is exactly representable in single precision. This mirrors how the same method already widens int8/int16 to IntegerType, and how ArrowProjectionVisitor._cast_if_needed already widens smaller integers up for cross-platform compatibility. Mapping float16 -> FloatType is the float analogue, so float16 columns round-trip instead of erroring.

Changes (pyiceberg/io/pyarrow.py):

_ConvertToIceberg.primitive: map pa.float16() -> FloatType().
ArrowProjectionVisitor._cast_if_needed: widen smaller float types to the target type on write (parallel to the existing integer-widening branch), so float16 arrays are cast to float32. Narrowing falls through to the existing promote() handling.

No dependency changes; pyproject.toml / uv.lock are untouched and the imports used were already present.

A note on a design choice, deferring to maintainers: widening float16 silently (rather than erroring or gating behind a config flag) follows the existing int8/int16 -> Integer precedent. Happy to gate it behind a config option instead if you'd prefer. The new float-widening branch also makes float32 -> DoubleType actually cast the array (parallel to int widening), so it slightly tightens float promotion in general, not just float16.

Are these changes tested?

Yes:

tests/io/test_pyarrow_visitor.py::test_pyarrow_float16_to_iceberg asserts the schema mapping pa.float16() -> FloatType().
tests/io/test_pyarrow.py::test__to_requested_schema_float_promotion is parametrized over f16 -> Float, f16 -> Double, and f32 -> Double, asserting both the written PyArrow type and that the values are preserved.

Both pass locally, the surrounding visitor suite and the sibling integer-promotion test still pass, and make lint (ruff, ruff-format, mypy, pydocstyle, codespell, uv-lock) is clean. The integration suite (Docker/Spark) was not run locally.

Are there any user-facing changes?

Yes. PyArrow tables with float16 columns can now be converted/written through PyIceberg (they map to Iceberg float and are stored as float32), where they previously raised UnsupportedPyArrowTypeException. This is purely additive; existing float32/float64 behavior is unchanged.

PyArrow's float16 (halffloat) raised UnsupportedPyArrowTypeException during schema conversion because _ConvertToIceberg.primitive only handled float32/float64. Iceberg has no half-precision float, but float16 -> float32 is lossless, mirroring how int8/int16 already widen to IntegerType. Map float16 to FloatType, and widen smaller float arrays to the target type in ArrowProjectionVisitor._cast_if_needed (parallel to the integer-widening branch) so float16 columns write as float32.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support pyarrow float16 by widening to float on read/write#3590

feat: support pyarrow float16 by widening to float on read/write#3590
anxkhn wants to merge 1 commit into
apache:mainfrom
anxkhn:loop/iceberg-python__003

anxkhn commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

anxkhn commented Jun 30, 2026

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant