docs: QLoRA Documentation and Notebooks by RexBearIU · Pull Request #3970 · AI-Hypercomputer/maxtext

RexBearIU · 2026-05-22T07:35:20Z

Description

This PR adds comprehensive documentation and tutorials for running LoRA/QLoRA fine-tuning, specifically focusing on multi-host TPU environments.

As PEFT techniques become more prevalent for large models, users need clear, step-by-step guidance on how to leverage MaxText and Tunix for multi-host tuning.

This PR includes:

A new dedicated Markdown tutorial (docs/tutorials/posttraining/lora_on_multi_host.md) detailing the prerequisites, environment setup, and execution steps for multi-host TPU tuning.
A fully documented Jupyter Notebook (src/maxtext/examples/lora_llama3_demo.ipynb) that walks through setup, pre-SFT evaluation (using vLLM rollouts), the Tunix LoRA SFT training loop, and post-SFT evaluation with weight
merging.
Minor path corrections in the existing lora.md tutorial.

Tests

Documentation only.
The Jupyter notebook has been run manually to ensure that all cells execute correctly (assuming the required dependencies and TPUs are available).

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-05-22T08:01:58Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

igorts-git · 2026-06-25T16:53:20Z

+To add weight mapping for vLLM decode:
+
+1. **Create a Weight Mapping Config**:
+   Create a new file in \[src/maxtext/integration/tunix/weight_mapping/\](../../src/maxtext/integration/tunix/weight_mapping/) (e.g., `your_model.py`) defining a mapping dataclass. You can refer to \[gemma3.py\](../../src/maxtext/integration/tunix/weight_mapping/gemma3.py) or \[llama3.py\](../../src/maxtext/integration/tunix/weight_mapping/llama3.py) as templates.


Our documentation is usually surfaced via https://maxtext.readthedocs.io/en/latest/index.html I am not sure that the hyperlinks to code in GitHub would work here. Please check how code links are implemented in other docs. We are also trying to keep docs consistent w.r.t. MaxText release versions. Such that if someone is reading docs for version 0.2.3, all hyperlinks also point to the same version.

@melissawm do you know how to correctly link to code in GitHub?

Yes, these will not work. The readthedocs site can only see relative links to documents under the docs/ folder. To link to files under src/ or other folders, the best way is to use the github link (in this case, https://github.com/AI-Hypercomputer/maxtext/blob/main/src/maxtext/integration/tunix/weight_mapping/llama3.py)

SurbhiJainUSC · 2026-06-25T22:03:22Z

+    "\n",
+    "    # Install uv, a fast Python package installer\n",
+    "    !pip install uv\n",
+    "    \n",


Please add this:

import os os.environ["UV_TORCH_BACKEND"]="cpu"

SurbhiJainUSC · 2026-06-25T22:05:24Z

+    "if not epath.Path(MODEL_CHECKPOINT_PATH).exists():\n",
+    "    # Install torch for the conversion script\n",
+    "    print(\"Installing torch...\")\n",
+    "    subprocess.run(\n",


I don't think we need to install torch for checkpoint conversion script anymore.

RexBearIU · 2026-06-26T00:52:43Z

Hi @SurbhiJainUSC for the remaining issues you bring up I will modify in another PR and let you know.

RexBearIU · 2026-06-26T02:37:56Z

I have created a new PR #4277 to address the leftover comments in the lora_llama3_demo.ipynb notebook.

RexBearIU · 2026-06-26T02:43:10Z

I have also updated the branch in #4277 to resolve the ReadTheDocs relative links issue in docs/guides/lora_model_bringup.md as discussed by @melissawm. All relative links to code files under src/ are now proper GitHub absolute URLs.

RexBearIU mentioned this pull request May 22, 2026

feat: QLoRA support for Dense/MoE models across Pathways and McJAX #3702

Closed

4 tasks

RexBearIU force-pushed the jackyf/qlora-docs branch 2 times, most recently from 80b5fab to 11360a1 Compare May 22, 2026 07:56

RexBearIU requested a review from parambole as a code owner May 22, 2026 07:56

RexBearIU force-pushed the jackyf/qlora-docs branch 2 times, most recently from b832e29 to 5745e3d Compare May 26, 2026 07:37