Integrate AutoRound into Diffusers#13552
Conversation
Signed-off-by: Xin He <xin3.he@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
I would like to integrate AutoRound into Diffusers to support diffusion models. Although the performance improvement is not significant, memory usage is notably reduced. |
|
Thanks for this PR! Could you provide some example code using AutoRound and also some example outputs? Feel free to also report latency and memory consumption so that there's some signal into its effectiveness. Cc: @SunMarc |
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
…ecified backend. Signed-off-by: Xin He <xin3.he@intel.com>
|
Thank you all for your valuable feedback; I have updated the document and fixed the errors. |
Signed-off-by: Xin He <xin3.he@intel.com>
Apologies; due to policy restrictions, I am unable to provide specific latency data. Based on local testing, the time consumed by the W4A16 Marlin backend is comparable to that of BF16. For the memory consumption, I observe that the original z-image takes about 25GB and the quantized one takes about 16GB You are welcome to use the code provided in the model card to attempt to reproduce these results. the generated picture looks good. |
|
|
||
| model_id = "INCModel/Z-Image-W4A16-AutoRound" | ||
|
|
||
| quantization_config = AutoRoundConfig(backend="marlin") |
There was a problem hiding this comment.
in case there is a better backend in the future, we'd better not to explicitly code like this. Besides, If users have install gptqmodel, we will use marlin. Otherwise, we will remind user to install it.
| pipe = ZImagePipeline.from_pretrained( | ||
| model_id, | ||
| torch_dtype=torch.bfloat16, | ||
| device_map="cuda", |
There was a problem hiding this comment.
how about just setting the device_map to "auto"
| model_id, | ||
| transformer=transformer, | ||
| torch_dtype=torch.bfloat16, | ||
| device_map="cuda", |
| if not is_auto_round_available(): | ||
| raise ImportError( | ||
| "Loading an AutoRound quantized model requires the auto-round library " | ||
| "(`pip install 'auto-round>=0.5'`)" |
I don't understand. Won't it run on a CUDA GPU? You cannot expect maintainers to run the code to address the bare minimums, I am afraid. Having the output samples and having comparisons to other methods gives us confidence in the quantization method itself. |
What does this PR do?
This pull request introduces support for the AutoRound quantization algorithm in the Diffusers library. AutoRound is a weight-only quantization method that enables efficient inference by optimizing weight rounding and min-max ranges, primarily targeting the W4A16 configuration (4-bit weights, 16-bit activations). The changes add a new quantization config, quantizer class, backend integration, and comprehensive documentation, while ensuring proper handling of optional dependencies.
Key changes:
AutoRound quantization support
AutoRoundConfigclass toquantization_config.pyfor configuring AutoRound quantization parameters, including bits, group size, symmetry, backend, and modules to exclude from quantization.AutoRoundQuantizerclass inquantizers/autoround/autoround_quantizer.py, implementing the logic for loading pre-quantized AutoRound models and integrating with the auto-round library.AutoRoundConfigandAutoRoundQuantizerin the quantization auto-mapping logic, enabling selection via the"auto-round"keyDependency management and import handling
is_auto_round_availableutility and integrated it into the main import structure and conditional imports, ensuring that AutoRound features are only available if the dependency is installed. Dummy objects are provided otherwise.require_auto_round_version_greater_or_equalfor version-gated testing of AutoRound features.Documentation
docs/source/en/quantization/autoround.md, including usage examples, backend options, configuration details, and resource links.These changes collectively enable seamless integration of AutoRound quantization into Diffusers, with robust configuration, backend selection, and user guidance.
Existed Model
cc @wenhuach21 @thuang6 @hshen14
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@yiyixuxu @asomoza @stevhliu @sayakpaul
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.