Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions docs/contributing/tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,3 +133,16 @@ def _assert_error_response_equal(py_response, php_response) -> None:
assert py_response.json()["code"] == php_response.json()["error"]["code"]

```

### Usage of the Database
You frequently need to write tests which include fetching from or writing to the database.
There is a test database that is prepopulated with data available for use as defined in our `compose.yaml` file.

The `expdb_test` and `user_test` connections automatically start a transaction during setup and perform a rollback during teardown.
This means that as long as you do not `.commit()` any changes, the data will not persist.
This is a good thing. We do not want our tests to have side effects, as it might lead to inconsistent behavior.

There is one situation where you may need to commit to the database: migration tests.
Since the PHP API communicates to the database in a separate transaction, changes made within the transaction in "Python land" are not visible to PHP.
In this case, be extremely careful! You *must* write the test so that even if things fail unexpectedly, there is no data left behind.
Generally speaking, you want to use a context manager that cleans up after you. In some cases you may need to clean up after yourself during the test.
Comment on lines +145 to +148
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add the known migration-test limitation for auth failure paths.

This section should mention that some authentication-failure scenarios cannot be asserted through migration tests when the PHP side returns XML instead of JSON, and should be covered in Python unit tests post-migration.

Suggested doc addition
 There is one situation where you may need to commit to the database: migration tests.
 Since the PHP API communicates to the database in a separate transaction, changes made within the transaction in "Python land" are not visible to PHP.
 In this case, be extremely careful! You *must* write the test so that even if things fail unexpectedly, there is no data left behind.
 Generally speaking, you want to use a context manager that cleans up after you. In some cases you may need to clean up after yourself during the test.
+
+Also note: for some legacy endpoints, authentication failure paths are not suitable for migration comparison tests if the PHP API returns XML error payloads. Cover those in Python unit tests once the dependent endpoint is migrated.

Based on learnings: Migration tests for OpenML API endpoints that rely on php_api for data staging cannot test authentication failure paths (Code 103) when the legacy PHP API returns XML instead of JSON; these paths should be tested in standard Python unit tests after dependent endpoints are migrated.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
There is one situation where you may need to commit to the database: migration tests.
Since the PHP API communicates to the database in a separate transaction, changes made within the transaction in "Python land" are not visible to PHP.
In this case, be extremely careful! You *must* write the test so that even if things fail unexpectedly, there is no data left behind.
Generally speaking, you want to use a context manager that cleans up after you. In some cases you may need to clean up after yourself during the test.
There is one situation where you may need to commit to the database: migration tests.
Since the PHP API communicates to the database in a separate transaction, changes made within the transaction in "Python land" are not visible to PHP.
In this case, be extremely careful! You *must* write the test so that even if things fail unexpectedly, there is no data left behind.
Generally speaking, you want to use a context manager that cleans up after you. In some cases you may need to clean up after yourself during the test.
Also note: for some legacy endpoints, authentication failure paths are not suitable for migration comparison tests if the PHP API returns XML error payloads. Cover those in Python unit tests once the dependent endpoint is migrated.
🧰 Tools
🪛 LanguageTool

[style] ~147-~147: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...re not visible to PHP. In this case, be extremely careful! You must write the test so that even...

(EN_WEAK_ADJECTIVE)


[style] ~148-~148: Consider using “who” when you are referring to a person instead of an object.
Context: ...king, you want to use a context manager that cleans up after you. In some cases you ...

(THAT_WHO)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/contributing/tests.md` around lines 145 - 148, Update the "migration
tests" section to document the known limitation: state that migration tests for
OpenML API endpoints that use the PHP API for staging cannot assert certain
authentication-failure paths (specifically Code 103) when the legacy PHP API
returns XML instead of JSON, and add guidance to instead cover those scenarios
in standard Python unit tests after migration; reference the terms "migration
tests", "php_api", "Code 103", and "authentication-failure" so readers can find
and implement the appropriate post-migration Python unit tests.

72 changes: 45 additions & 27 deletions docs/migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,30 +7,64 @@ largely functioning the same way. However, there are a few major deviations:
* Restriction or expansion of input types as appropriate.
* Standardizing authentication and access messages, and consistently execute those checks
before fetching data or providing error messages about the data.
* Errors are returned in the [RFC 9457](https://www.rfc-editor.org/rfc/rfc9457.html) standard, and HTTP status codes may be changed to be more semantically appropriate.

The list above is not exhaustive. Minor changes include, for example, bug fixes and the removal of unnecessary nesting.
There may be undocumented changes, especially in edge cases which may not have occurred in the test environment.
As the PHP API was underspecified, the re-implementation is based on a mix of reading old code and probing the API.
If there is a behavioral change which was not documented but affects you, please [open a bug report](https://github.com/openml/server-api/issues/new?assignees=&labels=bug%2C+triage&projects=&template=bug-report.md&title=).
If there is a behavioral change which was not documented but affects you, please [open a bug report](https://github.com/openml/server-api/issues/new?assignees=&labels=bug%2C+triage&projects=&template=bug-report.md&title=). Also feel free to open an issue on the issue tracker if you feel that we made a mistake with a decision on e.g., a new status code.

It is possible this migration guide is out of sync for endpoints not yet deployed to production (currently that includes them all).
Before an endpoint is deployed to production we will ensure that the documentation is up-to-date to the best of our knowledge.

## All Endpoints
The following changes affect all endpoints.
# RFC 9457 Errors
Errors will follow the RFC9457 standard. However, the original "code" is preserved through a custom field.
Take for example the "Dataset not found" response for trying to access a dataset that does not exist.

### Error on Invalid Input
When providing input of invalid types (e.g., a non-integer dataset id) the HTTP header
and JSON content will be different.
```diff title="CURL Commands"
curl -i https://www.openml.org/api/v1/json/data/1000000
- HTTP/1.1 412 Precondition Failed
- ...
- {"error":{"code":"111","message":"Unknown dataset"}}

+ HTTP/1.1 404 Not Found
+ ...
+ {"type":"https://openml.org/problems/dataset-not-found","title":"Dataset Not Found","status":404,"detail":"No dataset with id 100000 found.","code":"111"}
```

You will notice that the response still contains a "code" of "111" (though as a top level property not embedded in the "error" scope).
This field is included to support the migration of clients, but should be considered deprecated.
As per the RFC9457 standard, the "type" field now includes the unique code for the error.
Comment thread
sourcery-ai[bot] marked this conversation as resolved.
The "title" field is a human readable summary of the general issue and the "detail" field may provide additional information for the specific request.
They _will_ be resolvable URIs in the future, providing a page with more information.

In some cases the JSON endpoints previously returned XML ([example](https://github.com/openml/OpenML/issues/1200)), the new API always returns JSON.

# Appropriate HTTP Status Codes
There are several cases where the PHP server did not provide semantically correct status codes.
The Python server aims to correct that.
The errors that changed which are most likely to occur are probably errors when there is no result, or when the input is incorrect.


For not being able to resolve an identifier ("dataset not found"):

```diff title="HTTP Header"
- 412 Precondition Failed
+ 422 Unprocessable Entity
+ 404 Not Found
```

When authentication is required but not provided or not valid:

```diff title="HTTP Header"
- 412 Precondition Failed
+ 401 Unauthorized
```

```diff title="JSON Content"
- {"error":{"code":"100","message":"Function not valid"}}
+ {"detail":[{"loc":["query","_dataset_id"],"msg":"value is not a valid integer","type":"type_error.integer"}]}
For incorrect input (e.g., providing a string instead of an integer identifier):

```diff title="HTTP Header"
- 412 Precondition Failed
+ 422 Unprocessable Entity
```

!!! warning "Input validation has been added to many end points"
Expand All @@ -39,23 +73,7 @@ and JSON content will be different.
These endpoints now do enforce stricter input constraints.
Constraints for each endpoint parameter are documented in the API docs.

### Other Errors
For any other error messages, the response is identical except that outer field will be `"detail"` instead of `"error"`:

```diff title="JSON Content"
- {"error":{"code":"112","message":"No access granted"}}
+ {"detail":{"code":"112","message":"No access granted"}}
```

In some cases the JSON endpoints previously returned XML ([example](https://github.com/openml/OpenML/issues/1200)), the new API always returns JSON.

```diff title="XML replaced by JSON"
- <oml:error xmlns:oml="http://openml.org/openml">
- <oml:code>103</oml:code>
- <oml:message>Authentication failed</oml:message>
- </oml:error>
+ {"detail": {"code":"103", "message": "Authentication failed" } }
```
# Endpoint Specific Notes

## Datasets

Expand Down
Loading