feat: migrate pipeline to nnx by mesakhcienet · Pull Request #2885 · AI-Hypercomputer/maxtext

mesakhcienet · 2025-12-24T06:35:02Z

Description

implement nnx-based pipeline.

This PR extends PR#2831

Main changes:

nnx_decoders.py: implementing the missing pipeline logic in nnx_decoders.py.
pipeline.py : add a new class NNXPipeline, which is a nnx-based pipeline class.

Tests

we run the pipeline process with command below:

MODEL_NAME=llama2-7b
python -m MaxText.train src/maxtext/configs/base.yml \
    run_name=pipeline_test_${MODEL_NAME}_nnx \
    base_output_directory=/dev/shm/pipeline_test_nnx \
    model_name=${MODEL_NAME}\
    dataset_type=synthetic \
    steps=15 \
    debug_sharding=true \
    per_device_batch_size=2 \
    max_target_length=32 \
    ici_pipeline_parallelism=2 \
    num_pipeline_microbatches=4 \
    num_layers_per_pipeline_stage=2 \
    enable_checkpointing=false \
    enable_nnx=true \
    pure_nnx_decoder=true \
    scan_layers_per_stage=false \
    async_checkpointing=false > nnx-porting-log/pipeline/custom_${MODEL_NAME}.log 2>&1

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-01-19T07:41:42Z

Codecov Report

❌ Patch coverage is 55.65093% with 310 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/layers/nnx_decoders.py	21.60%	173 Missing and 23 partials ⚠️
src/maxtext/layers/pipeline.py	77.66%	59 Missing and 31 partials ⚠️
src/maxtext/layers/decoders.py	46.34%	19 Missing and 3 partials ⚠️
src/maxtext/layers/nnx_wrappers.py	60.00%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Add matching [PIPELINE-DIAG] and [DECODER-DIAG] tagged logging to NNX pipeline.py and decoders.py. Logs setup config, nnx.split state partitions, L1/L2/L3 custom_vjp residuals/leaf counts, outer scan carry structure (2 elements vs Linen's 3), vmap output structure, BSW buffer shapes, and closure-captured variables. fix: address challenger gaps in NNX diagnostic logging - C-1: Add total carry leaf count + total closure leaf count + GB - C-3: Add scatter_update flag, checkpoint policy at use site - C-4: Log to_linen_class wrapper in create_pipeline, stage_factory pattern in decoders.py

Move jax.checkpoint from wrapping outer_body (outside scan) to wrapping execute_pipeline_repeat (inside scan body). This matches Linen's nn.scan(nn.remat(stage_fn)) pattern which creates a closed_call boundary per iteration. Root cause: Linen's pattern lets XLA unroll trip-1 outer loops, inline the closed_call, and clone inner while-loops 16x — producing 48 small loops with small carries. NNX's pattern (checkpoint outside scan) creates trip-16 loops that XLA won't unroll (above threshold), resulting in 5 monolithic loops with large carries and poor buffer reuse. HLO evidence: - Linen: 8 -> 48 while-loops, preallocated-temp 12.15 GiB - NNX: 7 -> 5 while-loops, preallocated-temp 13.80 GiB Same fix applied to bubble iterations.

Revert to exact 0506 scan structure (no unroll parameter) but with dual-buffer BSW fix for numerical correctness: 1. outer_body: bsw_ref[0] = (cur_bsw, nxt_bsw) — 2 all-gathers per repeat so trailing stages get correct repeat's weights at boundaries 2. bubble: same dual-buffer pattern 3. Remove jax.lax.scan(unroll=N) — match 0506 behavior (unroll=1) 4. Restore bsw[0] is bsw[1] fast path in get_current_weights_from_bsw The dual-buffer is required for correctness: with (cur_bsw, cur_bsw), all stages get the same repeat's weights, but trailing stages at repeat boundaries need the previous repeat's weights.

Replace dual-buffer (2 all-gathers per repeat, +8 GB) with Linen's pattern: carry w_curr through outer scan, compute w_next via single weight_prefetching call per repeat. - outer carry: (loop_state, layer_mutables, w_curr) - outer_body: w_next = weight_prefetching(iteration), BSW = (w_curr, w_next) - w_next becomes next iteration's w_curr via carry - Bubbles: (final_w_curr, final_w_curr) — all stages on same repeat This matches Linen's create_pipeline_stage pattern exactly: 1 all-gather per repeat (not 2), w_curr carried (not recomputed).

mesakhcienet changed the title ~~core: migrate pipeline to nnx~~ feat: migrate pipeline to nnx Dec 24, 2025

mesakhcienet force-pushed the test/pipeline-scan-nnx branch 8 times, most recently from 6875da8 to f34b1a3 Compare January 15, 2026 23:43

mesakhcienet force-pushed the test/pipeline-scan-nnx branch 4 times, most recently from 12a3907 to 2c16599 Compare January 28, 2026 08:04

mesakhcienet force-pushed the test/pipeline-scan-nnx branch 2 times, most recently from 64dc147 to 9e4518e Compare February 2, 2026 01:58

mesakhcienet force-pushed the test/pipeline-scan-nnx branch from 631a73e to ac97a1d Compare March 2, 2026 08:48

mesakhcienet changed the base branch from main to xibin/nnx_all March 2, 2026 08:48

ecnal-cienet force-pushed the xibin/nnx_all branch 12 times, most recently from 1849f0b to 669dc01 Compare March 3, 2026 19:59

mesakhcienet added 30 commits May 12, 2026 14:47

test-vmap-1

d0ec8ee

Update pipeline.py

0f0f119

Update pipeline.py

04ea8ec

update implementation

c97f31e

Update pipeline.py

e24cbaa

test: add challenger experiment scripts for memory gap investigation

a300c16

fix: use scan_pipeline_repeats for outer repeat loop (not iterations)

5931f21

feat: port 0506 nested scan architecture + jax.lax.scan(unroll=N)

44acdbf

fix: use dual-buffer BSW to fix circular pipeline numerical mismatch

c4f667f

revert: restore L1/L2/L3 custom_vjp baseline (5a312b5)

5f24558

revert: restore nnx_wrappers.py and decoders.py to baseline

94034eb

feat: add diagnostic logging for metrics-as-scan-ys flow

b9d8bde

feat: add prevent_cse=True on L1 jax.remat for XLA fission barriers

c3cd002

feat: insert jax.lax.optimization_barrier inside L1 _forward

f0e54be

Update pipeline.py

a7fdcdc

Update nnx_wrappers.py

def6575

Update pipeline.py

851e70d

Update pipeline.py

b4c575e

Update pipeline.py

ce40cb9

Update pipeline.py

23219ab

Update pipeline.py

07155e4

Update pipeline.py

bf38b06

test older version

16c8498

update

ba96bc0

update

6f2dd8e

update

2e26fbd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: migrate pipeline to nnx#2885

feat: migrate pipeline to nnx#2885
mesakhcienet wants to merge 39 commits into
AI-Hypercomputer:mainfrom
CIeNET-International:test/pipeline-scan-nnx

mesakhcienet commented Dec 24, 2025 •

edited

Loading

Uh oh!

codecov Bot commented Jan 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mesakhcienet commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov Bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mesakhcienet commented Dec 24, 2025 •

edited

Loading

codecov Bot commented Jan 19, 2026 •

edited

Loading