Skip to content

feat(expressions): add truthiness semantics for constant boolean expressions#3589

Open
anxkhn wants to merge 1 commit into
apache:mainfrom
anxkhn:loop/iceberg-python__002
Open

feat(expressions): add truthiness semantics for constant boolean expressions#3589
anxkhn wants to merge 1 commit into
apache:mainfrom
anxkhn:loop/iceberg-python__002

Conversation

@anxkhn

@anxkhn anxkhn commented Jun 30, 2026

Copy link
Copy Markdown

Closes #3543

Rationale for this change

AlwaysTrue and AlwaysFalse subclass IcebergRootModel[bool] (a Pydantic
RootModel) but never defined __bool__, so they fell back to default object
truthiness. bool(AlwaysTrue()) was True by accident, but bool(AlwaysFalse())
was also True, and bool(EqualTo("x", 1)) was silently True as well. So
common patterns like expr or AlwaysTrue() or if not residual: evaluated a
FALSE constant as truthy, which is the opposite of what they read like.

This adds explicit truthiness, as proposed in #3543:

  • AlwaysTrue.__bool__ returns True, AlwaysFalse.__bool__ returns False.
  • The base BooleanExpression.__bool__ raises TypeError, so accidental
    if predicate: on a non-constant expression is a clear error pointing to ~expr
    or an explicit AlwaysTrue()/AlwaysFalse() comparison, instead of a silent
    always-true. Only the two constants override it.

Auditing optional-expression checks (called out in the issue) surfaced one latent
production bug: FileScanTask.from_rest_response used
residual=... if rest_task.residual_filter else ALWAYS_TRUE, so an explicit
AlwaysFalse() residual from a REST catalog was discarded to ALWAYS_TRUE because
bool(AlwaysFalse()) was True. Switched to an is not None presence check, which
both fixes that and is required once a non-constant residual no longer coerces to
bool. One test helper used the same expr or AlwaysTrue() pattern and got the same
fix. No other call site relies on bare truthiness; all use is / == / isinstance.

One design note for reviewers: @rambleraptor raised on the issue that making the
base class raise might be more than needed ("users shouldn't need to understand
BooleanExpression internals... only really a problem for AlwaysFalse()"). This
PR implements the base-class-raises design @kevinjqliu described in the issue. If
you would rather keep it minimal, the smaller alternative is to define __bool__
only on AlwaysTrue/AlwaysFalse and drop the base TypeError (and its test);
the residual fix is still needed either way. Happy to trim it down if preferred.

Are these changes tested?

Yes. bool(AlwaysTrue()) is True, bool(AlwaysFalse()) is False, control-flow over
both, and TypeError for non-constant predicates (EqualTo, IsNull, And, Or,
Not); plus REST scan-planning tests that a non-constant residual is preserved and a
missing one defaults to AlwaysTrue. Integration tests (Docker + Spark) were not
run locally; the change is pure expression logic and is fully covered by unit tests.

Are there any user-facing changes?

Yes (small, intentional): bool(AlwaysFalse()) now returns False, and
bool(<non-constant expression>) now raises TypeError instead of returning True.

…essions

`AlwaysTrue`/`AlwaysFalse` extend `IcebergRootModel[bool]` but defined no
`__bool__`, so they used default object truthiness and were both always truthy.
That made `bool(AlwaysFalse())` return `True`, so intuitive control-flow patterns
(`if not visit_starts_with(...)`, `expr or AlwaysTrue()`) behaved incorrectly for
the FALSE constant.

Define `__bool__` on the two constants (`True`/`False`) and make the base
`BooleanExpression.__bool__` raise `TypeError`, so truthiness is only defined for
the constants and an accidental `if predicate:` is an explicit error rather than a
silently-true no-op. `not expr` returns a Python bool; `~expr` still returns the
negated expression.

Per the audit kevinjqliu requested on the issue, fix the one optional-expression
check that relied on truthiness instead of presence: REST scan-planning residual
handling (`FileScanTask.from_rest_response`) now uses `is not None`, so an explicit
residual filter is preserved and not treated as missing. Also update the
`expr or AlwaysTrue()` default in the pyarrow `project` test helper.

Tests cover constant truthiness, control-flow branching, `TypeError` for
non-constant expressions, and the REST residual path.

Closes apache#3543
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add explicit truthiness semantics for constant BooleanExpressions

1 participant