Skip to content

Save run manifest for distillation reproducibility.#3709

Merged
copybara-service[bot] merged 1 commit intomainfrom
gagik-distill2
Apr 22, 2026
Merged

Save run manifest for distillation reproducibility.#3709
copybara-service[bot] merged 1 commit intomainfrom
gagik-distill2

Conversation

@gagika
Copy link
Copy Markdown
Collaborator

@gagika gagika commented Apr 21, 2026

Description

Save distillation.yml and command.sh (CLI overrides) to the run's output directory at startup, so a run can be reproduced by copying the YAML and re-running the saved command.

  • Works for local and GCS paths via etils.epath. Host-0 only.
  • Failures in the helper are caught and logged as a warning — they do not abort the training run.

Tests

Unit test and end to end test.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@github-actions
Copy link
Copy Markdown

🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## 📋 Review Summary

This PR introduces a helpful feature to save the run manifest (the source YAML configuration and the exact shell command) to the output directory. This significantly improves the reproducibility of distillation runs by allowing users to easily see and re-run the exact configuration used.

🔍 General Feedback

  • The implementation is clean and follows the pattern of existing MaxText utilities by using etils.epath for cross-platform and cloud storage support.
  • The use of jax.process_index() == 0 correctly ensures that only the lead process handles the file writing.
  • The inclusion of unit tests covering both successful file generation and error handling is excellent.
  • The minor suggestions provided aim to make the argument handling more robust against edge cases.

Comment thread src/maxtext/trainers/post_train/distillation/train_distill.py
Comment thread src/maxtext/trainers/post_train/distillation/train_distill.py
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 80.95238% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
.../trainers/post_train/distillation/train_distill.py 80.95% 2 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Collaborator

@JamesDeng42 JamesDeng42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this improvement, will make our life easier.

@copybara-service copybara-service Bot merged commit 3ebb185 into main Apr 22, 2026
44 checks passed
@copybara-service copybara-service Bot deleted the gagik-distill2 branch April 22, 2026 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants