Skip to content

Add support for splits on categorical features#120

Open
ndawe wants to merge 1 commit into
maxmind:mainfrom
ndawe:ndawe/categorical-features
Open

Add support for splits on categorical features#120
ndawe wants to merge 1 commit into
maxmind:mainfrom
ndawe:ndawe/categorical-features

Conversation

@ndawe

@ndawe ndawe commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • New Features

    • Added support for categorical splits in generated code models (with categorical values provided as integer category codes).
  • Documentation

    • Updated README and changelog with categorical split usage details, including how to represent missing feature values.
  • Tests

    • Expanded generate-and-run coverage to include the categorical model.

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Version constant is bumped from 1.0.0 to 1.1.0. CHANGELOG gains a 1.1.0 entry noting categorical split support. README adds documentation on categorical feature encoding and nil representation for missing features. TestGenerateAndRunModels gains a "categorical" model entry.

Changes

v1.1.0 Release: categorical split support docs, test, and version bump

Layer / File(s) Summary
Categorical split docs, test entry, and version bump
version.go, CHANGELOG.md, README.md, main_test.go
version constant updated to "1.1.0"; CHANGELOG documents categorical split support with integer category-code encoding; README adds categorical and missing-feature documentation; TestGenerateAndRunModels table gains a "categorical" model entry.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐇 A hop and a skip, version bumps up high,
Categorical splits now wave a warm hi!
Integer codes in the data slice nest,
Nil marks the missing — the rabbit knows best.
README and CHANGELOG, all tidy and bright,
xgb2code 1.1.0 is ready tonight! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title directly and clearly summarizes the main change: adding categorical feature split support, which is the primary focus of all modifications across version, documentation, and test files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for categorical splits in XGBoost models, updating the parser, templates, documentation, and tests accordingly. The review feedback suggests preventing potential integer overflow in slice bounds checks within gen/parse.go by using subtraction, and removing a redundant random_state parameter in the Python test model generator since a global random seed is already configured.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread gen/parse.go Outdated
Comment thread gen/testdata/categorical/generate-model.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@README.md`:
- Around line 20-23: The README.md documentation is making a promise about nil
handling for categorical features that is not yet implemented in the code. The
generator in gen/templates.go:57-73 directly dereferences *data[i] for
categorical predicates without nil checks, which would cause a panic if a nil
entry is passed. Either remove the claim about nil entry handling from the
README and scope it to numeric features only, or first add nil-safe handling to
the categorical predicate code in gen/templates.go before documenting that
feature.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: f7e3096b-aba3-4748-a379-5bc33820cad2

📥 Commits

Reviewing files that changed from the base of the PR and between 5cd6c65 and 8dd46e9.

⛔ Files ignored due to path filters (10)
  • gen/parse.go is excluded by !**/gen/**
  • gen/parse_test.go is excluded by !**/gen/**
  • gen/templates.go is excluded by !**/gen/**
  • gen/templates/decision_node.tmpl is excluded by !**/gen/**
  • gen/templates_test.go is excluded by !**/gen/**
  • gen/testdata/categorical/generate-model.py is excluded by !**/gen/**
  • gen/testdata/categorical/model.json is excluded by !**/gen/**
  • gen/testdata/categorical/preds.csv is excluded by !**/*.csv, !**/gen/**
  • gen/testdata/categorical/xtest.csv is excluded by !**/*.csv, !**/gen/**
  • mise.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • CHANGELOG.md
  • README.md
  • main_test.go
  • version.go

Comment thread README.md
@ndawe ndawe force-pushed the ndawe/categorical-features branch from 8dd46e9 to fb92b28 Compare June 16, 2026 12:38

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
README.md (1)

20-23: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't promise nil handling for categorical splits yet.

This still conflicts with gen/templates.go:57-73, which dereferences *data[i] for categorical predicates. A nil entry will panic, so either remove the nil claim or add nil-safe categorical handling first.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` around lines 20 - 23, The README.md documentation claims that
missing categorical features can be represented by nil entries in the data
slice, but the template code in gen/templates.go at lines 57-73 dereferences
pointers without nil checks for categorical predicates, which would cause a
panic if nil is passed. Either remove the statement about nil handling for
categorical splits from the README documentation (lines 20-23), or implement
nil-safe categorical predicate handling in the template code before documenting
the feature.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@README.md`:
- Around line 20-23: The README.md documentation claims that missing categorical
features can be represented by nil entries in the data slice, but the template
code in gen/templates.go at lines 57-73 dereferences pointers without nil checks
for categorical predicates, which would cause a panic if nil is passed. Either
remove the statement about nil handling for categorical splits from the README
documentation (lines 20-23), or implement nil-safe categorical predicate
handling in the template code before documenting the feature.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 8dab7760-6753-492c-be1a-3d04dc2d0a8b

📥 Commits

Reviewing files that changed from the base of the PR and between 8dd46e9 and fb92b28.

⛔ Files ignored due to path filters (10)
  • gen/parse.go is excluded by !**/gen/**
  • gen/parse_test.go is excluded by !**/gen/**
  • gen/templates.go is excluded by !**/gen/**
  • gen/templates/decision_node.tmpl is excluded by !**/gen/**
  • gen/templates_test.go is excluded by !**/gen/**
  • gen/testdata/categorical/generate-model.py is excluded by !**/gen/**
  • gen/testdata/categorical/model.json is excluded by !**/gen/**
  • gen/testdata/categorical/preds.csv is excluded by !**/*.csv, !**/gen/**
  • gen/testdata/categorical/xtest.csv is excluded by !**/*.csv, !**/gen/**
  • mise.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • CHANGELOG.md
  • README.md
  • main_test.go
  • version.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant