Skip to content

Jules changes and some model renaming for clarity#14

Open
akutuva21 wants to merge 234 commits into
RuleWorld:masterfrom
akutuva21:master
Open

Jules changes and some model renaming for clarity#14
akutuva21 wants to merge 234 commits into
RuleWorld:masterfrom
akutuva21:master

Conversation

@akutuva21
Copy link
Copy Markdown
Member

Harden, optimize, and test RuleHub's metadata/manifest tooling

Summary

This PR is a large fork-sync covering 125 commits to RuleHub's scripts/ build tooling and the generated metadata it produces. Work spans four areas: security hardening, performance (mostly sync→async I/O), test coverage, and cleanup/refactors, plus regeneration of manifests, gallery data, and README.md/metadata.yaml files to keep CI green.

Most commits were authored by @akutuva21, with several co-authored by AI coding agents (google-labs-jules[bot]).

What changed

Security

  • Prevent prototype pollution in setNested (block __proto__ / constructor / prototype; use plain objects).
  • Fix path-traversal vulnerabilities in directory scanning / metadata extraction.
  • Prevent RegExp injection in generate-manifest.

Performance

  • Convert synchronous file I/O to async across backfill-metadata, generate-manifest, generate-gallery, apply-gallery-assignments, and metadata validation (listModelFiles, findBnglFiles, findAllMetadataFiles, extractModelIds, etc.).
  • Micro-optimizations: hoist Set construction out of loops, cache trim() and RegExp creation, optimize parseMetadataYaml string splitting.

Tests

  • Broad unit-test coverage added for the scripts layer: expectString/expectArray/expectBoolean, parseScalar, parseMetadataYaml, normalizeModelKey, generateId, inferOrigin/inferCategory, processModelLine/processActionLine, isCollectionEntry, getIgnoreDirs/isIgnoredDir, gallery parsing, migration scripts, and error/edge paths throughout.
  • Tests updated to match the new async APIs and dynamic (non-hardcoded) paths.

Refactors & cleanup

  • Extract helper functions out of parseBngl and extractCategoryMappings.
  • Remove extraneous console.log calls (converted to console.info where appropriate); drop dead code (getIgnoreDirs, redundant nesting, unused vars).
  • Fix hardcoded absolute paths.

Generated data / CI

  • Add missing README.md files and metadata.yaml updates for Published models.
  • Regenerate manifest.json, manifest-slim.json, and gallery.generated.json; exempt legacy Published models to resolve manifest-drift CI failures.

Primary files touched

scripts/backfill-metadata.js, scripts/generate-manifest.js, scripts/apply-gallery-assignments.js, scripts/validate-metadata.js, scripts/generate-gallery.js, scripts/utils.js, their .test.js counterparts, scripts/migration/*, plus regenerated manifest*.json, gallery.generated.json, and many Published/**/README.md and metadata.yaml files.

Notes for reviewers

  • The bulk of the line count is regenerated metadata/manifest output, not hand-written logic — review the scripts/ changes first.
  • Several CI-fix commits are iterative (manifest drift / missing READMEs); the net end state is consistent and passing.

Commit breakdown

Category ~Count
Tests 55
Fixes / security 29
Refactor / chore 24
Performance 18

(Counts overlap where a commit spans categories; total commits = 125.)

akutuva21 and others added 30 commits May 11, 2026 16:03
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* Exported expectString from validate-metadata.js
* Added negative test cases to validate-metadata.test.js including:
  * Non-string values
  * Null values
  * Empty strings
  * Whitespace only strings
  * Valid strings

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Avoid redundant string allocations by caching the result of \`rawLine.trim()\` in \`parseMetadataYaml\`.

This eliminates 2 redundant \`.trim()\` operations per line, drastically reducing unnecessary allocations and processing.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Replace `content.split(/\r?\n/)` with a zero-allocation `indexOf('\n')` loop in `parseMetadataYaml` inside `scripts/utils.js`. This reduces memory allocations and garbage collection overhead by avoiding the creation of an intermediate array of tokens for every line in the file.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Moved the string/path manipulation logic used to generate the `id` for metadata out of `generateMetadata` and into a new, dedicated `generateId` function. This simplifies the `generateMetadata` function and improves code readability.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Adds extensive testing to `scripts/utils.test.js` covering the `parseScalar` function's behavior with arrays, booleans, and edge cases, ensuring regex string replacements like `replace(/^"|"$/g, '')` apply as expected across nested comma elements.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
This adds a test to verify that `generate-gallery.js` correctly catches errors from `fs.readFileSync` or YAML parsing errors and handles them gracefully by skipping the malformed files without crashing the process.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Removed `console.log(yamlContent)` from the `if (dryRun)` block to
reduce noise in the standard output.

Tested by running `node --test` with all applicable unit tests.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
…s.js

Removed a debugging `console.log` statement from `scripts/apply-gallery-assignments.js` that was generating noise and considered technical debt.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
…ignments.js

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Removed a single `console.log` line that printed out the number of bngl files found,
improving script command line output cleanliness without changing core functionality.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
…ments.js

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Removed noisy output `console.log` statements at the start of the `main()` function in `scripts/backfill-metadata.js`.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Added edge case assertions to scripts/utils.test.js for the `parseScalar` utility function. Test cases include handling of mismatched string quotes, mismatched brackets, irregular boolean casings, nested spaces in arrays/strings, and explicit decimal numbers.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Removed the extraneous `console.log` statements that print a summary at the end of the `main` function in `scripts/backfill-metadata.js`. This reduces noise in the output and improves code health.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
…ction issue

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Replaces the synchronous `fs.readdirSync` depth-first search in `listMetadataFiles` with an asynchronous `fs.promises.readdir` implementation. The refactor leverages `Promise.all` to concurrently scan directories and `try...catch` for robust `ENOENT` handling without redundant `fs.existsSync` checks. This optimizes I/O usage by preventing event loop blocking during file scanning across the repository's `SEARCH_ROOTS`.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
…mmary

Replaces console.log with console.info for the summary output in scripts/backfill-metadata.js. This resolves the code health issue regarding extraneous console logs while preserving the essential summary functionality of the script.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Replaced synchronous `listModelFiles` with `listModelFilesAsync` within the async context of `validateMetadataFile` to avoid blocking the Node.js event loop during heavy I/O operations.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Replaced synchronous `fs.readFileSync` with asynchronous `fs.promises.readFile` in `loadGalleryCategories` inside `scripts/generate-gallery.js` to prevent blocking the Node.js event loop during initialization. Changed main to await the category load.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
akutuva21 and others added 30 commits June 2, 2026 10:59
…versal-15062287209927623304

🔒 Fix Path Traversal in generate-manifest.js output flag
…974916847795327544

🧪 [Testing] Add edge case tests for inferCategory
…ionLine-9573349664910203796

🧪 Add tests for processActionLine to improve backfill metadata coverage
…36943815230223

🧪 test: add tests for processModelLine in backfill-metadata.js
…eid-12012714592641404061

🧪 [testing improvement] Add tests for generateId in backfill-metadata.js
…736384652318

🧪 Add missing tests for expectEnum
…965731491338

🧪 test: add unit tests for getIgnoreDirs
…123594865509

🧪 Add tests for validateMetadataFile
…5298177206964

🧪 Add tests for isCollectionEntry in generate-manifest.js
…3728183734872069

🧹 Code Health: Remove unused `getIgnoreDirs` function and associated dead code
…0449047627298925

🧪 [test] add unit tests for expectArray
…525129266671774

🔒 fix: prevent path traversal on input file read
…0554265

🧪 Add tests for isIgnoredDir in generate-manifest
Update test duplicates in scripts/tests/ and tests/ that were
left behind after PR merges changed functions to async and
updated normalizeModelKey to preserve hyphens instead of
stripping them.
…-manifest.js

Add missing DEFAULT_IGNORE_DIRS constant and isIgnoredDir function
that were referenced but never defined, causing ReferenceError in CI.
Also fix syntax error in test file (extra closing braces).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant