Skip to content

Add optional --skip-validation flag to benchmark recipes and XPK workload creation#3708

Merged
copybara-service[bot] merged 1 commit intoAI-Hypercomputer:mainfrom
CIeNET-International:maxtext/user/dora/add_skip_validaion_flag
Apr 22, 2026
Merged

Add optional --skip-validation flag to benchmark recipes and XPK workload creation#3708
copybara-service[bot] merged 1 commit intoAI-Hypercomputer:mainfrom
CIeNET-International:maxtext/user/dora/add_skip_validaion_flag

Conversation

@RUEI4341
Copy link
Copy Markdown
Contributor

@RUEI4341 RUEI4341 commented Apr 21, 2026

Description

Context
Previously, benchmark recipe interfaces (e.g., pw_mcjax_benchmark_recipe) lacked a mechanism to propagate the --skip-validation intent to the underlying generated commands. This prevented users from bypassing validation steps during benchmark execution.

Solution
Introduced an optional --skip-validation flag to the benchmark recipe interface. This flag is threaded down to the xpk workload create command to bypass system dependency checks (such as docker or kubectl-kjob) and resource validation. This is specifically required for Airflow/Composer integration, where these tools are not available or necessary for launching workloads, thus avoiding unnecessary environment complexity.

  • Default Behavior: If the flag is omitted, the existing validation logic remains active, ensuring backward compatibility.

  • New Behavior: When provided, it explicitly skips the validation phase in the workload creation.

Tests

Validated the flag propagation by executing the pw_mcjax_benchmark_recipe with the new parameter:

Execution Command:
python3 -m benchmarks.recipes.pw_mcjax_benchmark_recipe --user=root --cluster_name=pw-v6e-32x4 --project=cienet-cmcs --zone=us-central1-b --benchmark_steps=20 --num_slices_list=1 --server_image=[us-docker.pkg.dev/cloud-tpu-v2-images/pathways/server:latest](https://www.google.com/url?sa=D&q=http%3A%2F%2Fus-docker.pkg.dev%2Fcloud-tpu-v2-images%2Fpathways%2Fserver%3Alatest) --proxy_image=[us-docker.pkg.dev/cloud-tpu-v2-images/pathways/proxy_server:latest](https://www.google.com/url?sa=D&q=http%3A%2F%2Fus-docker.pkg.dev%2Fcloud-tpu-v2-images%2Fpathways%2Fproxy_server%3Alatest) --runner=[gcr.io/tpu-prod-env-multipod/skip_val:latest](https://www.google.com/url?sa=D&q=http%3A%2F%2Fgcr.io%2Ftpu-prod-env-multipod%2Fskip_val%3Alatest) --selected_model_framework=pathways --selected_model_names=default_basic_1 --priority=medium --max_restarts=1 --bq_enable=False --bq_db_project=cloud-tpu-multipod-dev --bq_db_dataset=chzheng_test_100steps --workload_id=roo-pw-default-1-ha5 --device_type=v6e-32 --skip-validation

Verification:
Execution logs confirm that the flag was correctly parsed and passed to the xpk command.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

Comment thread benchmarks/recipes/parser_utils.py Outdated
"--skip-validation",
action="store_true",
default=False,
help="Skip validation during workload creation in xpk.",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can you please add some info on what validation is being skipped here? It looks like this is passing a pathways-specific flag through. If that is the case, maybe this should just also say something about this being a pathways-specific flag?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this isn't Pathways-specific. It's a general flag for the xpk workload create

In environments like Airflow/Composer, we only use workload create and create-pathway commands and don't actually need system dependencies like docker or kubectl-kjob. Currently, xpk validation fails if those tools aren't installed. Adding this flag allows us to skip those non-essential checks, significantly reducing the complexity and 'workarounds' needed for our CI/CD integration.

I've updated the help text to reflect that this skips system dependency and health validation.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@RUEI4341 RUEI4341 force-pushed the maxtext/user/dora/add_skip_validaion_flag branch from baa2f1c to 9a513e1 Compare April 22, 2026 07:17
@copybara-service copybara-service Bot merged commit 172d0f1 into AI-Hypercomputer:main Apr 22, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants