-
Notifications
You must be signed in to change notification settings - Fork 120
add selection params #1997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
add selection params #1997
Changes from all commits
Commits
Show all changes
137 commits
Select commit
Hold shift + click to select a range
4925446
Refactor hybrid queries to use `alpha_param` and remove `0.7` default…
tsmith023 1899ffd
Remove mistakenly commited local change to `regen.sh`
tsmith023 c057270
Update logic to use new proto message
tsmith023 8e5d54d
Change formatting
tsmith023 367ce79
Tidy version check code
tsmith023 81414a1
Parse correct default for BC if server < 1.36
tsmith023 a13da23
Update CI image
tsmith023 7abe7f5
Fix wrong version comparison
tsmith023 b5f5e55
Fix typo in ci
tsmith023 47841d4
Update CI image
tsmith023 e305b33
Remove client-side default from aggregate queries
tsmith023 c4d6e87
Merge pull request #1985 from weaviate/tsmith023/remove-default-hybri…
tsmith023 007c38a
add selection params
robbespo00 8520a1c
Update ver check and CI tags
tsmith023 ed37375
Remove test of lazy loading shards
tsmith023 454e0e4
Merge branch 'main' into dev/1.37
tsmith023 1433985
Refactor client test for new server lazy shard loading
tsmith023 d5d5117
Merge branch 'dev/1.37' of https://github.com/weaviate/weaviate-pytho…
tsmith023 dd6c835
Debug failing ci test
tsmith023 cf96ccb
Remove outdated lazy shard load test
tsmith023 4b669e1
Update CI images
tsmith023 890a414
Add per-test timeouts and stack dump on timeout
tsmith023 2893822
Reduce per-test timeout to 5 mins
tsmith023 6edd0e1
Fix inc backups test
tsmith023 e7cb697
Add server version check for incremental backups
tsmith023 12d064f
Remove comment
tsmith023 fe5c522
Hard kill the process on timeout detection
tsmith023 2d961d1
Timeout putting the sentinel to avoid deadlocking
tsmith023 fadcebb
Handle the timeout of sentinel pushing gracefully
tsmith023 659bf28
Merge remote-tracking branch 'origin/main' into rob/diversity
robbespo00 78ff5ec
feat: add TextAnalyzerConfig for ASCII folding in text properties
amourao 6931a6f
refactor: ruff format
amourao bda3008
feat: add min version check
amourao 9bea05a
Merge branch 'main' into dev/1.37
dirkkul 77fc0ff
feat: update TextAnalyzerConfig docstring for ascii_fold attributes
amourao a8d6927
feat: add asciiFold check in _text_analyzer_from_config function
amourao e8919a3
test: fix ASCII folding tests
amourao 3cc6306
feat: add support for stopword presets in inverted index configuratio…
amourao ef04dea
test: added live and config tests
amourao 8f1b33b
refactor: improve docstrings for stopword presets and asciiFold tests
amourao 03d6ff4
refactor: simplify _any_property_has_text_analyzer function using _pr…
amourao 1342204
test: remove redundant insertion ascii fold tests from test_collectio…
amourao cb53d6a
test: add stopwords roundtrip test for collection configuration
amourao 9de03f3
feat: add model validator to enforce asciiFoldIgnore constraints in T…
amourao 7018927
feat: add factory class for text analyzer configurations with ASCII f…
amourao 8e91984
refactor: update TextAnalyzerConfig usage to new Configure class methods
amourao 30814fc
Merge branch 'feat/ascii-fold' into feat/stopword-presets
amourao db3009c
test: remove redundant line in stopword presets merge test
amourao 50f7768
refactor: use factory pattern
amourao 6a1b0bc
Add MCP permission
g-despot a0efe43
refactor: format text analyzer configuration for better readability
amourao fa92fc2
refactor: remove server side behavior tests
amourao 27cd0a4
test: add stopword presets roundtrip tests for Weaviate collections
amourao a241d8c
Fix formatting
g-despot 83c2431
refactor: remove unnecessary stopword preset coercion from _TextAnaly…
amourao 4e0a0f2
refactor: replace custom text analyzer method with a direct function …
amourao eaea155
Merge branch 'dev/1.37' into feat/ascii-fold
amourao 38c7f44
chore: remove unused deprecated import from config.py
amourao ec43d53
Merge branch 'feat/stopword-presets' into feat/ascii-fold
amourao b3eb0ac
chore: update WEAVIATE_137 version to 1.37.0-rc.1-578c4eb in workflow
amourao ceef271
refactor: update text analyzer method to use new static method in Con…
amourao 5e751bf
test: add stopwords roundtrip test with ASCII folding configuration
amourao 31737e9
Merge pull request #2006 from weaviate/feat/ascii-fold
dirkkul 9c4295b
Add query profiling
g-despot 6fd60b5
Reformatted
g-despot a1df098
Skip test for lower versions
g-despot 239ed32
feat: add tokenizer module with sync and async support, including int…
amourao 480dbe0
Add support for collection export endpoint
dirkkul 92c3d1f
Small cleanup after review
dirkkul 3dc9259
Rename ENum
dirkkul c36e540
adapt to latest version
dirkkul 54eea32
Update UX
dirkkul 42bfc5c
Remove export path parameter
dirkkul 2c74967
Self-review of changes
dirkkul ed9f288
Review fixes
dirkkul e9192f8
Add version guard for export integration tests
dirkkul 338195a
Update to latest image
dirkkul 90840ce
Lowercase export ID
dirkkul 584f8a6
Enforce kwargs for export
dirkkul 96ca193
Fix tests
dirkkul 594e8ee
Fix tests
g-despot b83a948
Add negative assertions
g-despot 9e8c7b1
Merge branch 'dev/1.37' into query-profiling
g-despot b9a7c69
Merge pull request #1981 from weaviate/export_collection
dirkkul 8b2caaf
refactor: names don't shadow existing
amourao ede0b96
fix: add version gate
amourao 8d379f4
refactor: update tokenization type to use Tokenization enum in Tokeni…
amourao 91a359a
refactor: models
amourao 61665e7
refactor: move tokenize property to class config
amourao 3f78571
Merge branch 'dev/1.37' into feat/tokenizer-endpoint
amourao aea0327
fix: remove trailing whitespace in __init__.py
amourao ef55ce2
test: add version gate for Weaviate >= 1.37.0 in tokenization tests
amourao dff05f5
feat: add support for blobHash property type
antas-marcin 1f256b5
Merge pull request #1986 from weaviate/add-support-for-blob-hash-prop…
dirkkul 1b4eea1
Merge pull request #2012 from weaviate/feat/tokenizer-endpoint
dirkkul 906b35b
Add full_with_profile
g-despot 7e5b1be
Merge pull request #2011 from weaviate/query-profiling
g-despot 66a2fb2
Refactor RBAC permissions
g-despot c116257
Merge branch 'dev/1.37' into mcp-rbac
g-despot 0955364
Bump Weaviate version
g-despot ca1cb88
Merge pull request #2010 from weaviate/mcp-rbac
tsmith023 3624e8b
refactor: tokenization executor and models to support stopword config…
amourao 5a12f13
fix: update Weaviate 1.37.1 version to include specific build identifier
amourao 633af0f
Merge branch 'main' of https://github.com/weaviate/weaviate-python-cl…
tsmith023 7b0042a
Merge branch 'dev/1.37' of https://github.com/weaviate/weaviate-pytho…
tsmith023 d760577
Reduce timeouts in batch tests
tsmith023 60887f3
fix: update Weaviate 1.37.1 version to include architecture suffix
amourao 9fd83b8
fix: refactor tokenization tests to use parameterized cases for impro…
amourao e9d6812
fix: update Weaviate 1.37.1 version and enhance tokenization tests wi…
amourao 202948a
Merge branch 'dev/1.37' into fix/tokenize_simple_output
amourao 959f554
refactor: ruff format
amourao 0f7fe47
test: refactor output types and tests to config
amourao 52c2c8c
refactor: remove unused imports in tokenization models and format
amourao 3de0955
Use public classes for .text endpoint
dirkkul 55b136a
Add overloads for exclusivity of stopwrods
dirkkul 7924e45
Accept collection config classes as stopwords
dirkkul 64bed62
Improve docstring
dirkkul 220e839
Hook up tokenization and clean up model
dirkkul 081aaef
Move property back to tokenization
dirkkul cae3e33
Merge pull request #2019 from weaviate/fix/tokenize_simple_output
dirkkul 5bc5470
Add integration tests
g-despot 4b44a5d
Merge branch 'dev/1.37' into rob/diversity
g-despot 7ab97c4
Fix stubs and proto version
g-despot b68160d
Add more tests
g-despot 6017fde
Rename _DiversityMMR to MMR
g-despot d3651a3
FIx test versions
g-despot ee1e781
Fix linter issue
g-despot 5a32738
Implement feedback
g-despot edbcce8
Merge branch 'main' into rob/diversity
g-despot 7f4c031
Rename to DiversitySelection
g-despot 2067620
Fix flake8 error
g-despot 93141ab
Add diversity to hybrid
g-despot 19cf554
Add hybrid tests
g-despot e88d17e
Remove hybrid support
g-despot 218ee16
Implement feedback
g-despot 33ef2e1
Fix ruff format
g-despot 8f21c33
Add MMR to output module
g-despot File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,117 @@ | ||
| import pytest | ||
|
|
||
| from integration.conftest import CollectionFactory | ||
| from weaviate.classes.query import Diversity | ||
| from weaviate.collections.classes.config import Configure, DataType, Property | ||
| from weaviate.collections.classes.data import DataObject | ||
|
|
||
|
|
||
| def _create_clustered_collection(collection_factory: CollectionFactory): | ||
| """Create a collection with 3 tight clusters (a, b, c) of vectors in 3D.""" | ||
| collection = collection_factory( | ||
| properties=[Property(name="text", data_type=DataType.TEXT)], | ||
| vectorizer_config=Configure.Vectorizer.none(), | ||
| ) | ||
| if collection._connection._weaviate_version.is_lower_than(1, 37, 0): | ||
| pytest.skip("Diversity selection requires Weaviate >= 1.37.0") | ||
| collection.data.insert_many( | ||
| [ | ||
| DataObject(properties={"text": "a1"}, vector=[1.0, 0.0, 0.0]), | ||
| DataObject(properties={"text": "a2"}, vector=[0.95, 0.05, 0.0]), | ||
| DataObject(properties={"text": "a3"}, vector=[0.9, 0.1, 0.0]), | ||
| DataObject(properties={"text": "b1"}, vector=[0.0, 1.0, 0.0]), | ||
| DataObject(properties={"text": "b2"}, vector=[0.05, 0.95, 0.0]), | ||
| DataObject(properties={"text": "c1"}, vector=[0.0, 0.0, 1.0]), | ||
| ] | ||
| ) | ||
| return collection | ||
|
|
||
|
|
||
| def test_near_vector_diversity_selection(collection_factory: CollectionFactory) -> None: | ||
| """Verify that the client passes diversity_selection to the server correctly. | ||
|
|
||
| Two orthogonal assertions — server-side logic (MMR itself) is out of scope: | ||
| - ``balance`` reaches the server: balance=0.0 produces a different UUID ordering than balance=1.0 | ||
| - ``limit`` reaches the server: len(result) == mmr_limit | ||
| """ | ||
| collection = _create_clustered_collection(collection_factory) | ||
| mmr_limit = 3 | ||
|
|
||
| balance_0 = collection.query.near_vector( | ||
| near_vector=[1.0, 0.0, 0.0], | ||
| diversity_selection=Diversity.mmr(limit=mmr_limit, balance=0.0), | ||
| ).objects | ||
| balance_1 = collection.query.near_vector( | ||
| near_vector=[1.0, 0.0, 0.0], | ||
| diversity_selection=Diversity.mmr(limit=mmr_limit, balance=1.0), | ||
| ).objects | ||
|
|
||
| # mmr_limit reaches the server → result count equals it | ||
| assert len(balance_0) == mmr_limit | ||
| assert len(balance_1) == mmr_limit | ||
| # balance reaches the server → different ordering | ||
| assert [o.uuid for o in balance_0] != [o.uuid for o in balance_1] | ||
|
|
||
|
|
||
| def test_near_text_diversity_selection(collection_factory: CollectionFactory) -> None: | ||
| """Smoke test: diversity_selection kwarg is wired through the near_text entry point.""" | ||
| collection = collection_factory( | ||
| properties=[Property(name="name", data_type=DataType.TEXT)], | ||
| vectorizer_config=Configure.Vectorizer.text2vec_contextionary( | ||
| vectorize_collection_name=False | ||
| ), | ||
| ) | ||
| if collection._connection._weaviate_version.is_lower_than(1, 37, 0): | ||
| pytest.skip("Diversity selection requires Weaviate >= 1.37.0") | ||
| for name in ["banana", "apple", "orange", "car", "truck", "bike"]: | ||
| collection.data.insert({"name": name}) | ||
|
|
||
| result = collection.query.near_text( | ||
| query="fruit", | ||
| diversity_selection=Diversity.mmr(limit=3, balance=0.5), | ||
| ) | ||
| assert len(result.objects) == 3 | ||
|
|
||
|
|
||
| def test_near_object_diversity_selection(collection_factory: CollectionFactory) -> None: | ||
| """Smoke test: diversity_selection kwarg is wired through the near_object entry point.""" | ||
| collection = _create_clustered_collection(collection_factory) | ||
| anchor = next(iter(collection.query.fetch_objects().objects)).uuid | ||
|
|
||
| result = collection.query.near_object( | ||
| near_object=anchor, | ||
| diversity_selection=Diversity.mmr(limit=3, balance=0.5), | ||
| ) | ||
| assert len(result.objects) == 3 | ||
|
|
||
|
|
||
| def test_generate_diversity_selection(collection_factory: CollectionFactory) -> None: | ||
| """Smoke test: diversity_selection kwarg is wired through the .generate namespace.""" | ||
| collection = collection_factory( | ||
| properties=[Property(name="name", data_type=DataType.TEXT)], | ||
| vectorizer_config=Configure.Vectorizer.text2vec_contextionary( | ||
| vectorize_collection_name=False | ||
| ), | ||
| generative_config=Configure.Generative.custom("generative-dummy"), | ||
| ) | ||
| if collection._connection._weaviate_version.is_lower_than(1, 37, 0): | ||
| pytest.skip("Diversity selection requires Weaviate >= 1.37.0") | ||
| for name in ["banana", "apple", "orange", "car", "truck", "bike"]: | ||
| collection.data.insert({"name": name}) | ||
|
|
||
| result = collection.generate.near_text( | ||
| query="fruit", | ||
| single_prompt="Describe {name}", | ||
| diversity_selection=Diversity.mmr(limit=3, balance=0.5), | ||
| ) | ||
| assert len(result.objects) == 3 | ||
|
|
||
|
|
||
| def test_diversity_selection_api_surface() -> None: | ||
| """Test the public API surface of Diversity: factory guard + mmr factory method.""" | ||
| # Direct instantiation of the factory class fails | ||
| with pytest.raises(TypeError): | ||
| Diversity() | ||
|
|
||
| # Diversity.mmr() produces an MMR-configured selection object | ||
| assert Diversity.mmr(limit=3, balance=0.5).limit == 3 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the comment above/below, do we need all 10 test cases for a 2-parameter config?