Skip to content

Conversation

@Jefffrey
Copy link
Contributor

@JefffreyJefffrey commented Jan 2, 2026

Which issue does this PR close?

Rationale for this change

Found lots of code duplicated here, so unifying them.

What changes are included in this PR?

  • Reduce code duplication around handling scalar/aggregate/window UDFs in type coercion related code, via usage of new UDFCoercionExt trait which unifies some of their behaviours (introduced by Refactor duplicate code in type_coercion/functions.rs #19518)
  • Deprecate functions can_coerce_from() and generate_signature_error_msg() to work towards minimizing our public API surface
  • Fix some UDF signatures which nested UserDefined within OneOf
  • Fix bug where type coercion rewrites weren't being applied to arguments of non-aggregate window UDFs

Are these changes tested?

Existing tests.

Are there any user-facing changes?

Deprecated some functions.

@github-actionsgithub-actionsbot added logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate functions Changes to functions implementation spark labels Jan 2, 2026
}
}

/// Performs type coercion for scalar function arguments.
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganizing these so the only public function of this file is near the top (fields_with_udf())

/// (losslessly converted) into a value of `type_to`
///
/// See the module level documentation for more detail on coercion.
#[deprecated(since = "52.0.0", note = "Unused internal function")]
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was only ever used in a test; trying to minimize our API surface area for simplicity so deprecating this

Comment on lines +155 to +158
Expr::ScalarFunction(_)
| Expr::WindowFunction(_)
| Expr::AggregateFunction(_) => {
Ok(self.to_field(schema)?.1.data_type().clone())
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing existing logic from to_field() implementation


/// Verify that function is invoked with correct number and type of arguments as
/// defined in `TypeSignature`.
fnverify_function_arguments<F:UDFCoercionExt>(
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new UDFCoercionExt trait we can now more easily share common code between scalar, aggregate and window UDFs

///
/// Otherwise, returns an error if there's a type mismatch between
/// the window function's signature and the provided arguments.
fndata_type_and_nullable_with_window_function(
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been inlined to to_field()

Comment on lines +940 to 941
#[deprecated(since = "52.0.0", note = "Internal function")]
pubfngenerate_signature_error_msg(
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly removing this from our public API since I see no need for it to be public

pubfnnew() -> Self{
Self{
signature:Signature::one_of(
vec![TypeSignature::Nullary,TypeSignature::UserDefined],
Copy link
ContributorAuthor

@JefffreyJefffreyJan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like the idea of having UserDefined nested within a OneOf signature, so amending some functions which are doing this. Ideally a function declaring UserDefined should handle all of its coercion logic itself, instead of mixing this custom logic with the other signature API.

Comment on lines +691 to +692
expr::WindowFunctionDefinition::WindowUDF(udf) => {
coerce_arguments_for_signature(args,self.schema, udf.as_ref())?
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This arm for coercing non-aggregate window UDFs actually was missing; adding this back in lead to discovering some UDF window tests were incorrect (see next comment)

fnnew(test_state:Arc<TestState>) -> Self{
let signature =
Signature::exact(vec![DataType::Float64],Volatility::Immutable);
Signature::exact(vec![DataType::Int64],Volatility::Immutable);
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sample UDF declared it accepted f64's but internally it acted as if input was i64; this was never caught because apparently type coercion wasn't being run on non-aggregate window functions, and the input data was already i64. Nice little fix.

@JefffreyJefffrey marked this pull request as ready for review January 2, 2026 09:41
Copy link
Member

@martin-gmartin-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

TypeSignature::Variadic(valid_types) => valid_types
.iter()
.map(|valid_type| current_types.iter().map(|_| valid_type.clone()).collect())
.map(|valid_type| vec![valid_type.clone(); current_types.len()])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

coreCore DataFusion cratefunctionsChanges to functions implementationlogical-exprLogical plan and expressionsoptimizerOptimizer rulesspark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@Jefffrey@martin-g