feat(expressions): add truthiness semantics for constant boolean expressions#3589
Open
anxkhn wants to merge 1 commit into
Open
feat(expressions): add truthiness semantics for constant boolean expressions#3589anxkhn wants to merge 1 commit into
anxkhn wants to merge 1 commit into
Conversation
…essions `AlwaysTrue`/`AlwaysFalse` extend `IcebergRootModel[bool]` but defined no `__bool__`, so they used default object truthiness and were both always truthy. That made `bool(AlwaysFalse())` return `True`, so intuitive control-flow patterns (`if not visit_starts_with(...)`, `expr or AlwaysTrue()`) behaved incorrectly for the FALSE constant. Define `__bool__` on the two constants (`True`/`False`) and make the base `BooleanExpression.__bool__` raise `TypeError`, so truthiness is only defined for the constants and an accidental `if predicate:` is an explicit error rather than a silently-true no-op. `not expr` returns a Python bool; `~expr` still returns the negated expression. Per the audit kevinjqliu requested on the issue, fix the one optional-expression check that relied on truthiness instead of presence: REST scan-planning residual handling (`FileScanTask.from_rest_response`) now uses `is not None`, so an explicit residual filter is preserved and not treated as missing. Also update the `expr or AlwaysTrue()` default in the pyarrow `project` test helper. Tests cover constant truthiness, control-flow branching, `TypeError` for non-constant expressions, and the REST residual path. Closes apache#3543
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3543
Rationale for this change
AlwaysTrueandAlwaysFalsesubclassIcebergRootModel[bool](a PydanticRootModel) but never defined__bool__, so they fell back to default objecttruthiness.
bool(AlwaysTrue())wasTrueby accident, butbool(AlwaysFalse())was also
True, andbool(EqualTo("x", 1))was silentlyTrueas well. Socommon patterns like
expr or AlwaysTrue()orif not residual:evaluated aFALSEconstant as truthy, which is the opposite of what they read like.This adds explicit truthiness, as proposed in #3543:
AlwaysTrue.__bool__returnsTrue,AlwaysFalse.__bool__returnsFalse.BooleanExpression.__bool__raisesTypeError, so accidentalif predicate:on a non-constant expression is a clear error pointing to~expror an explicit
AlwaysTrue()/AlwaysFalse()comparison, instead of a silentalways-true. Only the two constants override it.
Auditing optional-expression checks (called out in the issue) surfaced one latent
production bug:
FileScanTask.from_rest_responseusedresidual=... if rest_task.residual_filter else ALWAYS_TRUE, so an explicitAlwaysFalse()residual from a REST catalog was discarded toALWAYS_TRUEbecausebool(AlwaysFalse())wasTrue. Switched to anis not Nonepresence check, whichboth fixes that and is required once a non-constant residual no longer coerces to
bool. One test helper used the same
expr or AlwaysTrue()pattern and got the samefix. No other call site relies on bare truthiness; all use
is/==/isinstance.One design note for reviewers: @rambleraptor raised on the issue that making the
base class raise might be more than needed ("users shouldn't need to understand
BooleanExpressioninternals... only really a problem forAlwaysFalse()"). ThisPR implements the base-class-raises design @kevinjqliu described in the issue. If
you would rather keep it minimal, the smaller alternative is to define
__bool__only on
AlwaysTrue/AlwaysFalseand drop the baseTypeError(and its test);the residual fix is still needed either way. Happy to trim it down if preferred.
Are these changes tested?
Yes.
bool(AlwaysTrue()) is True,bool(AlwaysFalse()) is False, control-flow overboth, and
TypeErrorfor non-constant predicates (EqualTo,IsNull,And,Or,Not); plus REST scan-planning tests that a non-constant residual is preserved and amissing one defaults to
AlwaysTrue. Integration tests (Docker + Spark) were notrun locally; the change is pure expression logic and is fully covered by unit tests.
Are there any user-facing changes?
Yes (small, intentional):
bool(AlwaysFalse())now returnsFalse, andbool(<non-constant expression>)now raisesTypeErrorinstead of returningTrue.