Skip to content

feat: Add Motif-Video model and pipelines#13551

Open
waitingcheung wants to merge 1 commit intohuggingface:mainfrom
waitingcheung:feat/motif-video
Open

feat: Add Motif-Video model and pipelines#13551
waitingcheung wants to merge 1 commit intohuggingface:mainfrom
waitingcheung:feat/motif-video

Conversation

@waitingcheung
Copy link
Copy Markdown

@waitingcheung waitingcheung commented Apr 23, 2026

What does this PR do?

This PR adds support for Motif-Video - a text-to-video (T2V) and image-to-video (I2V) diffusion model from Motif Technologies. The implementation includes the transformer architecture, both pipeline variants, guiding configurations, and comprehensive documentation.

Changes

New Files

  • Model: src/diffusers/models/transformers/transformer_motif_video.py - MotifVideoTransformer3DModel
  • Pipelines:
    • src/diffusers/pipelines/motif_video/pipeline_motif_video.py - Text-to-Video
    • src/diffusers/pipelines/motif_video/pipeline_motif_video_image2video.py - Image-to-Video
  • Output: src/diffusers/pipelines/motif_video/pipeline_output.py
  • Tests:
    • tests/pipelines/motif_video/test_motif_video.py
    • tests/pipelines/motif_video/test_motif_video_image2video.py
  • Documentation:
    • docs/source/en/api/models/motif_video_transformer_3d.md
    • docs/source/en/api/pipelines/motif_video.md

Key Features

  • Architecture: DiT-based transformer with T5Gemma2Encoder for text encoding
  • Flow Match: Uses FlowMatchEulerDiscreteScheduler
  • Guiding: Supports ClassifierFreeGuidance, SkipLayerGuidance, and AdaptiveProjectedGuidance
  • Video Processing: Wan-style VAE for video encoding/decoding

Version Requirements

  • transformers>=5.1.0 - Required for T5Gemma2Encoder (critical bug fix in PR #43633)
  • The pipeline includes a version check that raises a clear error with upgrade instructions if the transformers version is too old

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@github-actions github-actions Bot added documentation Improvements or additions to documentation models tests utils pipelines guiders size/L PR with diff > 200 LOC labels Apr 23, 2026
@waitingcheung waitingcheung changed the title Add Motif Video model and pipelines Add Motif-Video model and pipelines Apr 23, 2026
@waitingcheung
Copy link
Copy Markdown
Author

waitingcheung commented Apr 23, 2026

@yiyixuxu @asomoza @sayakpaul

Quick ping for visibility. This PR adds Motif-Video (T2V/I2V + new transformer and pipelines).

Would appreciate your feedback, especially on dependency/version constraints:

  • transformers>=5.1.0 for T5Gemma2Encoder (currently enforced via an assertion with an upgrade message)
  • compel requiring transformers<5, which may conflict with diffusers usage

This is currently blocking some diffusers-side integration, so your input would help.

A working branch for this integration is available here.

@waitingcheung waitingcheung marked this pull request as ready for review April 23, 2026 06:07
@waitingcheung waitingcheung changed the title Add Motif-Video model and pipelines feat: Add Motif-Video model and pipelines Apr 23, 2026
…dance support

Add complete Motif Video implementation to diffusers:

New Models:
- Add MotifVideoTransformer3DModel with T5Gemma2Encoder for multimodal conditioning
- Supports text-to-video and image-to-video generation with vision tower integration

New Pipelines:
- Add MotifVideoPipeline for text-to-video generation
  - Default resolution: 736x1280, 121 frames, 25 fps
  - Supports classifier-free guidance and AdaptiveProjectedGuidance
- Add MotifVideoImage2VideoPipeline for image-to-video generation
  - First frame conditioning with vision encoder
  - Same defaults as T2V pipeline

Enhanced Guidance:
- Update AdaptiveProjectedGuidance with normalization_dims parameter
  - Support "spatial" normalization for 5D tensors (per-frame spatial normalization)
  - Support custom dimension lists for flexible normalization
  - Update AdaptiveProjectedMixGuidance with same parameter

Documentation & Tests:
- Add comprehensive API documentation for transformer and pipelines
- Add test suites for both T2V and I2V pipelines
- Register all new components in __init__ files
- Add dummy objects for torch and transformers backends

Total: 18 files changed, 3416 insertions(+), 2 deletions(-)
@github-actions github-actions Bot added single-file size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 23, 2026
@sayakpaul
Copy link
Copy Markdown
Member

transformers>=5.1.0 for T5Gemma2Encoder (currently enforced via an assertion with an upgrade message)

I think we can guard the transformers import in the pipeline with something like is_transformers_version("<", "5.1.0")?

compel conflict is fine IMO.

@sayakpaul sayakpaul requested review from dg845 and yiyixuxu April 23, 2026 10:25
@waitingcheung
Copy link
Copy Markdown
Author

transformers>=5.1.0 for T5Gemma2Encoder (currently enforced via an assertion with an upgrade message)

I think we can guard the transformers import in the pipeline with something like is_transformers_version("<", "5.1.0")?

compel conflict is fine IMO.

We have something like this at the top of the pipeline code to guide the users to upgrade the transformers package before importing T5Gemma2Encoder

# Check transformers version before importing T5Gemma2Encoder
if not is_transformers_version(">=", "5.1.0"):
    import transformers

    raise ImportError(
        f"MotifVideoPipeline requires transformers>=5.1.0. "
        f"Found: {transformers.__version__}. "
        "Please upgrade transformers: pip install transformers --upgrade"
    )

@sayakpaul
Copy link
Copy Markdown
Member

Then it will cut it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation guiders models pipelines single-file size/L PR with diff > 200 LOC tests utils

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants