Skip to content

feat: add tokenizer module with sync and async support#2012

Merged
dirkkul merged 9 commits intodev/1.37from
feat/tokenizer-endpoint
Apr 16, 2026
Merged

feat: add tokenizer module with sync and async support#2012
dirkkul merged 9 commits intodev/1.37from
feat/tokenizer-endpoint

Conversation

@amourao
Copy link
Copy Markdown

@amourao amourao commented Apr 14, 2026

  • Global tokenizer and per-property
  • Tests

Copilot AI review requested due to automatic review settings April 14, 2026 20:05
Copy link
Copy Markdown

@orca-security-eu orca-security-eu bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new tokenize client module (sync + async) to call Weaviate’s tokenization endpoints and returns a typed TokenizeResult, with integration tests covering serialization/deserialization and both client variants.

Changes:

  • Introduce weaviate.tokenize module with shared executor logic plus sync/async wrappers.
  • Add TokenizeResult return type and response parsing for analyzer/stopword configs.
  • Wire client.tokenize into WeaviateClient / WeaviateAsyncClient and add integration tests.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
weaviate/tokenize/types.py Defines TokenizeResult dataclass returned by tokenization calls.
weaviate/tokenize/executor.py Implements /tokenize and property tokenization requests + response parsing.
weaviate/tokenize/sync.py Sync wrapper for the tokenize executor via @executor.wrap("sync").
weaviate/tokenize/async_.py Async wrapper for the tokenize executor via @executor.wrap("async").
weaviate/tokenize/init.py Exposes tokenize module symbols.
weaviate/client.py Adds tokenize namespace to both sync and async clients.
weaviate/client.pyi Updates type stubs to include tokenize attributes on clients.
weaviate/init.py Exposes the tokenize module at the package root.
integration/test_tokenize.py Integration coverage for sync/async tokenize calls and config handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread weaviate/tokenization/models.py Outdated
Comment thread weaviate/tokenization/executor.py Outdated
Comment thread weaviate/tokenization/executor.py Outdated
Comment thread weaviate/tokenization/models.py Outdated
Comment thread weaviate/tokenization/executor.py Outdated
Comment thread weaviate/tokenization/executor.py Outdated
Comment thread weaviate/tokenize/executor.py Outdated
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 53.46939% with 114 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (dev/1.37@b9a7c69). Learn more about missing BASE report.

Files with missing lines Patch % Lines
integration/test_tokenize.py 46.47% 91 Missing ⚠️
weaviate/tokenization/executor.py 60.00% 10 Missing ⚠️
weaviate/tokenization/models.py 66.66% 8 Missing ⚠️
weaviate/collections/config/executor.py 44.44% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             dev/1.37    #2012   +/-   ##
===========================================
  Coverage            ?   86.65%           
===========================================
  Files               ?      293           
  Lines               ?    22563           
  Branches            ?        0           
===========================================
  Hits                ?    19552           
  Misses              ?     3011           
  Partials            ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dirkkul dirkkul merged commit 1b4eea1 into dev/1.37 Apr 16, 2026
240 of 241 checks passed
@dirkkul dirkkul deleted the feat/tokenizer-endpoint branch April 16, 2026 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants