Skip to content

Fix training gradient underflow in quantization tests#13539

Open
jiqing-feng wants to merge 3 commits intohuggingface:mainfrom
jiqing-feng:torchao-fix-training-underflow
Open

Fix training gradient underflow in quantization tests#13539
jiqing-feng wants to merge 3 commits intohuggingface:mainfrom
jiqing-feng:torchao-fix-training-underflow

Conversation

@jiqing-feng
Copy link
Copy Markdown
Contributor

What does this PR do?

Changes autocast dtype from float16 to bfloat16 in _test_quantization_training. Float16's limited dynamic range (max ~65504, min subnormal ~5.96e-8) causes gradients to underflow to zero when passing through quantized tensor subclass operations; bfloat16 shares float32's exponent range and avoids this.

Change autocast dtype from float16 to bfloat16 in _test_quantization_training.
Float16's limited dynamic range causes gradients to underflow to zero when
passing through quantized tensor subclass operations.
@github-actions github-actions Bot added size/S PR with diff < 50 LOC tests and removed size/S PR with diff < 50 LOC labels Apr 22, 2026
@github-actions github-actions Bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S PR with diff < 50 LOC tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant