Skip to content

[FSTORE-1938] Support chaining of Transformation Functions using a DAG#580

Open
manu-sj wants to merge 1 commit into
logicalclocks:mainfrom
manu-sj:FSTORE-1938
Open

[FSTORE-1938] Support chaining of Transformation Functions using a DAG#580
manu-sj wants to merge 1 commit into
logicalclocks:mainfrom
manu-sj:FSTORE-1938

Conversation

@manu-sj

@manu-sj manu-sj commented May 18, 2026

Copy link
Copy Markdown
Contributor

No description provided.

@manu-sj manu-sj marked this pull request as draft May 21, 2026 13:06
@manu-sj manu-sj force-pushed the FSTORE-1938 branch 2 times, most recently from 5ed6dcb to b770050 Compare May 28, 2026 07:50
@manu-sj manu-sj force-pushed the FSTORE-1938 branch 2 times, most recently from 6eacba8 to cbf2ed3 Compare June 4, 2026 11:25
@manu-sj manu-sj marked this pull request as ready for review June 8, 2026 08:59
@manu-sj manu-sj force-pushed the FSTORE-1938 branch 2 times, most recently from ff87ced to 4db4444 Compare June 10, 2026 08:19
@manu-sj manu-sj requested a review from Copilot June 10, 2026 08:20

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds documentation for chaining Transformation Functions into a dependency graph (DAG) in the Hopsworks Feature Store docs, including how execution order is resolved, how to visualize the DAG, and how parallel execution behaves for independent branches.

Changes:

  • Documented chaining semantics for Transformation Functions (ODT + MDT), including cycle/duplicate-output rejection behavior.
  • Added guidance on visualizing the transformation execution DAG from UI and SDK.
  • Added performance/parallelism tuning details via n_processes, including defaults and serving-time pool pre-spawn.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
docs/user_guides/fs/transformation_functions.md Introduces chained transformation DAG concept, DAG visualization, and performance tuning/parallelism behavior.
docs/user_guides/fs/feature_view/model-dependent-transformations.md Adds a section describing chaining model-dependent transformations and links to performance tuning guidance.
docs/user_guides/fs/feature_group/on_demand_transformations.md Adds a section describing chaining on-demand transformations and the cross-DAG path into feature views/MDTs.

A model-dependent transformation can consume another MDT's output as its input.
The DAG is resolved automatically at execution time, so producers always run before consumers.

!!! example "Chaining two normalizers and a sum"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to "Chaining two increments and a sum" to match the add_one/add code. Fixed in efcea35.


## Chaining Model-Dependent Transformations

A model-dependent transformation can consume another MDT's output as its input.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defined on first use: "A model-dependent transformation (MDT) can consume another MDT's output". Fixed in efcea35.

Hopsworks resolves the execution order automatically using a topological sort of the resulting DAG, so dependencies always run before their consumers.
Chaining works for both on-demand transformations attached to a feature group and model-dependent transformations attached to a feature view.

!!! example "Chained MDTs on a feature view"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelled out: "Chained model-dependent transformations on a feature view". Fixed in efcea35.


## Chaining On-Demand Transformations

On-demand transformations attached to the same feature group can be chained: one ODT's output column can serve as another ODT's input.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defined on first use: "On-demand transformations (ODTs) attached to the same feature group". Fixed in efcea35.

An intermediate output consumed only by a downstream ODT can be dropped from the feature group; the full chain still executes during online serving, and the dropped column never becomes a stored feature.

An ODT's output column becomes a regular feature in the feature group, which a downstream feature view can consume and pass into a model-dependent transformation.
This is the implicit cross-DAG path between ODT and MDT chains: nothing extra to configure on either side.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelled out: "between on-demand and model-dependent transformation chains". Fixed in efcea35.

…xecution DAG

https://hopsworks.atlassian.net/browse/FSTORE-1938

Document chaining of transformation functions across the user guides:
how the output of one function feeds another, how the execution DAG
resolves the order, how cycles and duplicate output columns are
rejected, and how the DAG is rendered from the UI and from the SDK
with visualize_transformations().

A Transformation Functions Performance Tuning subsection in the
transformation functions guide covers the node-parallel execution
model: the n_processes argument and its defaults per input shape,
pool pre-spawning through init_serving and init_batch_scoring, Arrow
shared-memory staging, and the HSFS_TF_POOL_START_METHOD override.

The model-dependent transformations guide notes that statistics for
chained functions are fit in dependency order on the data each
function sees. The on-demand transformations guide covers chains
whose intermediate output is dropped from the feature group. No
migration entry is included since the changes are backwards
compatible.

Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants