feat(experimentation): experiment results model, task and endpoints#7796
feat(experimentation): experiment results model, task and endpoints#7796gagantrivedi wants to merge 12 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feat/experiment-results-query #7796 +/- ##
===============================================================
Coverage 98.57% 98.58%
===============================================================
Files 1462 1463 +1
Lines 56762 57084 +322
===============================================================
+ Hits 55955 56277 +322
Misses 807 807 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Persist Bayesian results in an ExperimentResults row (the ExperimentExposures pattern: OneToOne, as_of/payload/last_error_at/refresh_requested_at, is_final freezing completed experiments before the warehouse TTL). compute_results_summary orchestrates the warehouse aggregation: it derives the metric specs from the attached metrics and the expected variant split from the environment's multivariate allocations (control takes the unallocated remainder), then runs the kernel. compute_experiment_results runs it off the refresh endpoint and records the row, preserving the last good payload on warehouse failure. GET/POST .../results/ and .../results/refresh/ clone the exposures pair; the shared guard ladder (pre-start, finality, throttle) moves into _refresh_panel.
acb3e36 to
d94bd85
Compare
ExperimentExposures and ExperimentResults had identical fields and lifecycle. Hoist them into a generic abstract ExperimentComputation[SummaryT]: the subclass binds which summary record_refresh stores, so it stays type-safe per panel. Each concrete model keeps only its experiment OneToOneField (and so its related_name); no schema change.
- Rename the view-layer PanelT/_refresh_panel to ComputationT/
_refresh_computation so the shared abstraction has one name everywhere
(ExperimentComputation), rather than reintroducing UI "panel" vocabulary.
- _refresh_computation takes the finished 400 messages instead of a noun,
dropping the unwritten assumption that the noun is a plural subject.
- _expected_variant_shares selects the current feature state by highest id
(matching Environment's Max("id") convention) instead of relying on the
default ascending ordering, which picked the oldest version; note the
coupling to features' multivariate representation.
- Declare experiment on ExperimentComputation under TYPE_CHECKING so is_final
is type-checked rather than suppressed with a type: ignore.
compute_experiment_exposures and compute_experiment_results had identical bodies. Hoist the lifecycle (load experiment, skip if not started or final, compute over the full window, preserve the last good payload on failure) into _refresh_computation_row; each task becomes a thin adapter supplying its model, summary function and failure logger. The helper is generic over SummaryT so record_refresh stays checked against the payload type. The failure log stays in the per-task adapter, not the helper, so its event name remains a literal the docs generator can discover.
85c5982 to
7db02c3
Compare
4f4876e to
94671bf
Compare
Inline the two refresh endpoints but extract the precondition checks into _validate_refresh_request, which raises ValidationError (not started / final) or Throttled (cooling down) and lets DRF render the response — including the Retry-After header. Keeps the drift-prone validation in one place while each endpoint's happy path stays inline, with no callbacks or generic TypeVar.
94671bf to
694bf2c
Compare
Emit srm.unkeyed_variant before bailing out of _expected_variant_shares so the skipped check is visible to operators. Trim the docstring to the useful part and note the TODO to source the split from the percentage-split segment override feature state once that exists.
Add is_final to ExperimentExposuresSerializer and ExperimentResultsSerializer so the UI can decide whether to show the refresh button without re-deriving it from the experiment's ended_at.
A misconfigured feature whose multivariate options over-allocate would give control a negative expected share. Bail out of _expected_variant_shares with an srm.overallocated log instead of relying on the downstream zero-share guard.
Docker builds report
|
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Failed testsfirefox › tests/project-permission-test.pw.ts › Project Permission Tests › Project-level permissions control access to features, environments, audit logs, and segments @enterprise Details
Failed testsfirefox › tests/project-permission-test.pw.ts › Project Permission Tests › Project-level permissions control access to features, environments, audit logs, and segments @enterprise Details
Failed testsfirefox › tests/project-permission-test.pw.ts › Project Permission Tests › Project-level permissions control access to features, environments, audit logs, and segments @enterprise Details
Failed testsfirefox › tests/project-permission-test.pw.ts › Project Permission Tests › Project-level permissions control access to features, environments, audit logs, and segments @enterprise Details
Failed testsfirefox › tests/project-permission-test.pw.ts › Project Permission Tests › Project-level permissions control access to features, environments, audit logs, and segments @enterprise |
Visual Regression19 screenshots compared. See report for details. |
Zaimwa9
left a comment
There was a problem hiding this comment.
Looking good! Thanks. Wish we had the control as a MV option.
If there are some details to finetune i'll take it up from here while implementing the frontend
Thanks for submitting a PR! Please check the boxes below:
docs/if required so people know about the feature.Changes
Contributes to the experiment results stats layer (stacked on #7781; merge after it).
So the experiment detail page can show per-metric results — lift, chance-to-win, and a sample-ratio-mismatch check — this computes them from the warehouse on demand and stores one row per experiment, updated in place (mirroring the exposures panel).
ExperimentResultsmodel (migration0008), on a new abstractExperimentComputationbase now shared withExperimentExposures:as_of/payload/last_error_at/refresh_requested_at,is_final, andrecord_refresh/record_failure/record_refresh_request.compute_results_summary(services.py): derives the metric specs and the expected SRM split from the environment's multivariate allocations (control= unallocated remainder), then runs the feat(experimentation): results aggregation query and payload builder #7781 aggregation and feat(experimentation): Bayesian stats kernel #7769 kernel. SRM is skipped (andsrm.unkeyed_variantlogged) when an option has no variant key to attribute its share to.compute_experiment_resultstask: recomputes the full window; on warehouse failure keeps the last good payload and logsresults.compute_failed.GET …/results/andPOST …/results/refresh/: read the row, or enqueue a refresh (202)._validate_refresh_requestraisesValidationErrorbefore start / once final (400) andThrottledwithin the refresh interval (429+Retry-After).How did you test this code?
make testfor the experimentation app — unit tests for the model, task,compute_results_summary/_experiment_metric_specs/_expected_variant_shares, and both endpoints, at 100% diff coverage.mypyandruffclean.