Summary
datafusion/sql/src/unparser/expr.rs currently handles Operator::IsDistinctFrom and Operator::IsNotDistinctFrom in separate match arms with duplicated dialect-dispatch logic. Factor this logic into a small helper so the dialect-specific spelling, null-safe semantics, and parenthesization rules are encoded in one place.
Context
#22999 adds MySQL-specific unparsing for DataFusion's distinct-from operators:
a IS NOT DISTINCT FROM b -> a <=> b
a IS DISTINCT FROM b -> NOT (a <=> b)
The implementation adds similar match self.dialect.distinct_from_style() blocks in both operator arms. This duplication makes it easier for future changes to update one operator but miss the other, especially around SQL precedence and dialect-specific syntax.
Problem
The invariant is: both IS DISTINCT FROM and IS NOT DISTINCT FROM must be emitted as a matched pair for each dialect, preserving null-safe equality semantics and unambiguous SQL parsing.
Duplicating the dialect mapping in two branches weakens that invariant because:
- Parenthesization fixes must be applied in both places.
- New dialect styles must update both operators consistently.
- The relationship between the two operators is not explicit in code.
Proposed refactor
Add a private helper in datafusion/sql/src/unparser/expr.rs, for example:
fn distinct_from_to_sql(
&self,
left: ast::Expr,
right: ast::Expr,
negated: bool,
) -> Result<ast::Expr> {
match self.dialect.distinct_from_style() {
DistinctFromStyle::FullText => {
let expr = if negated {
ast::Expr::IsNotDistinctFrom(Box::new(left), Box::new(right))
} else {
ast::Expr::IsDistinctFrom(Box::new(left), Box::new(right))
};
Ok(ast::Expr::Nested(Box::new(expr)))
}
DistinctFromStyle::Spaceship => {
let spaceship = ast::Expr::Nested(Box::new(ast::Expr::BinaryOp {
left: Box::new(left),
op: BinaryOperator::Spaceship,
right: Box::new(right),
}));
if negated {
Ok(spaceship)
} else {
Ok(ast::Expr::Nested(Box::new(ast::Expr::UnaryOp {
op: UnaryOperator::Not,
expr: Box::new(spaceship),
})))
}
}
}
}
Then both match arms become small and symmetrical:
Operator::IsDistinctFrom => {
let left = self.expr_to_sql_inner(left.as_ref())?;
let right = self.expr_to_sql_inner(right.as_ref())?;
self.distinct_from_to_sql(left, right, false)
}
Operator::IsNotDistinctFrom => {
let left = self.expr_to_sql_inner(left.as_ref())?;
let right = self.expr_to_sql_inner(right.as_ref())?;
self.distinct_from_to_sql(left, right, true)
}
Exact naming is flexible. A clearer enum than negated: bool would also be reasonable if preferred.
Expected benefits
- Keeps null-safe equality and its negation in one abstraction.
- Reduces duplicate control flow in a large unparser function.
- Makes precedence-sensitive parenthesization harder to get wrong.
- Simplifies future dialect additions or syntax changes.
- Keeps the refactor local to
expr.rs with no public API change.
Suggested tests
Add or keep tests covering:
-
Default/full-text dialect:
c1 IS DISTINCT FROM true
c1 IS NOT DISTINCT FROM true
-
MySQL dialect:
IS NOT DISTINCT FROM unparses to <=>
IS DISTINCT FROM unparses as NOT (<=>) or another unambiguous equivalent, not ambiguous NOT a <=> b
-
If parenthesization changes, add a precedence-sensitive MySQL case verifying output remains unambiguous (for example, NOT (`c1` <=> true), not NOT `c1` <=> true).
Related PR
#22999
Summary
datafusion/sql/src/unparser/expr.rscurrently handlesOperator::IsDistinctFromandOperator::IsNotDistinctFromin separate match arms with duplicated dialect-dispatch logic. Factor this logic into a small helper so the dialect-specific spelling, null-safe semantics, and parenthesization rules are encoded in one place.Context
#22999 adds MySQL-specific unparsing for DataFusion's distinct-from operators:
a IS NOT DISTINCT FROM b->a <=> ba IS DISTINCT FROM b->NOT (a <=> b)The implementation adds similar
match self.dialect.distinct_from_style()blocks in both operator arms. This duplication makes it easier for future changes to update one operator but miss the other, especially around SQL precedence and dialect-specific syntax.Problem
The invariant is: both
IS DISTINCT FROMandIS NOT DISTINCT FROMmust be emitted as a matched pair for each dialect, preserving null-safe equality semantics and unambiguous SQL parsing.Duplicating the dialect mapping in two branches weakens that invariant because:
Proposed refactor
Add a private helper in
datafusion/sql/src/unparser/expr.rs, for example:Then both match arms become small and symmetrical:
Exact naming is flexible. A clearer enum than
negated: boolwould also be reasonable if preferred.Expected benefits
expr.rswith no public API change.Suggested tests
Add or keep tests covering:
Default/full-text dialect:
c1 IS DISTINCT FROM truec1 IS NOT DISTINCT FROM trueMySQL dialect:
IS NOT DISTINCT FROMunparses to<=>IS DISTINCT FROMunparses asNOT (<=>)or another unambiguous equivalent, not ambiguousNOT a <=> bIf parenthesization changes, add a precedence-sensitive MySQL case verifying output remains unambiguous (for example,
NOT (`c1` <=> true), notNOT `c1` <=> true).Related PR
#22999