Skip to content

feat(experimentation): experiment results model, task and endpoints#7796

Open
gagantrivedi wants to merge 12 commits into
feat/experiment-results-queryfrom
feat/experiment-results-model
Open

feat(experimentation): experiment results model, task and endpoints#7796
gagantrivedi wants to merge 12 commits into
feat/experiment-results-queryfrom
feat/experiment-results-model

Conversation

@gagantrivedi

@gagantrivedi gagantrivedi commented Jun 16, 2026

Copy link
Copy Markdown
Member

Thanks for submitting a PR! Please check the boxes below:

  • I have read the Contributing Guide.
  • I have added information to docs/ if required so people know about the feature.
  • I have filled in the "Changes" section below.
  • I have filled in the "How did you test this code" section below.

Changes

Contributes to the experiment results stats layer (stacked on #7781; merge after it).

So the experiment detail page can show per-metric results — lift, chance-to-win, and a sample-ratio-mismatch check — this computes them from the warehouse on demand and stores one row per experiment, updated in place (mirroring the exposures panel).

  • ExperimentResults model (migration 0008), on a new abstract ExperimentComputation base now shared with ExperimentExposures: as_of / payload / last_error_at / refresh_requested_at, is_final, and record_refresh / record_failure / record_refresh_request.
  • compute_results_summary (services.py): derives the metric specs and the expected SRM split from the environment's multivariate allocations (control = unallocated remainder), then runs the feat(experimentation): results aggregation query and payload builder #7781 aggregation and feat(experimentation): Bayesian stats kernel #7769 kernel. SRM is skipped (and srm.unkeyed_variant logged) when an option has no variant key to attribute its share to.
  • compute_experiment_results task: recomputes the full window; on warehouse failure keeps the last good payload and logs results.compute_failed.
  • GET …/results/ and POST …/results/refresh/: read the row, or enqueue a refresh (202). _validate_refresh_request raises ValidationError before start / once final (400) and Throttled within the refresh interval (429 + Retry-After).

How did you test this code?

make test for the experimentation app — unit tests for the model, task, compute_results_summary / _experiment_metric_specs / _expected_variant_shares, and both endpoints, at 100% diff coverage. mypy and ruff clean.

@vercel

vercel Bot commented Jun 16, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Jun 17, 2026 10:38am
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
flagsmith-frontend-preview Ignored Ignored Preview Jun 17, 2026 10:38am
flagsmith-frontend-staging Ignored Ignored Preview Jun 17, 2026 10:38am

Request Review

@github-actions github-actions Bot added api Issue related to the REST API docs Documentation updates feature New feature or request and removed docs Documentation updates labels Jun 16, 2026
@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.58%. Comparing base (1cdb491) to head (17e9032).

Additional details and impacted files
@@                       Coverage Diff                       @@
##           feat/experiment-results-query    #7796    +/-   ##
===============================================================
  Coverage                          98.57%   98.58%            
===============================================================
  Files                               1462     1463     +1     
  Lines                              56762    57084   +322     
===============================================================
+ Hits                               55955    56277   +322     
  Misses                               807      807            

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Persist Bayesian results in an ExperimentResults row (the ExperimentExposures
pattern: OneToOne, as_of/payload/last_error_at/refresh_requested_at, is_final
freezing completed experiments before the warehouse TTL).

compute_results_summary orchestrates the warehouse aggregation: it derives the
metric specs from the attached metrics and the expected variant split from the
environment's multivariate allocations (control takes the unallocated
remainder), then runs the kernel. compute_experiment_results runs it off the
refresh endpoint and records the row, preserving the last good payload on
warehouse failure.

GET/POST .../results/ and .../results/refresh/ clone the exposures pair; the
shared guard ladder (pre-start, finality, throttle) moves into _refresh_panel.
@gagantrivedi gagantrivedi force-pushed the feat/experiment-results-model branch from acb3e36 to d94bd85 Compare June 16, 2026 10:57
@github-actions github-actions Bot added the docs Documentation updates label Jun 16, 2026
@github-actions github-actions Bot added feature New feature or request and removed feature New feature or request docs Documentation updates labels Jun 16, 2026
ExperimentExposures and ExperimentResults had identical fields and lifecycle.
Hoist them into a generic abstract ExperimentComputation[SummaryT]: the
subclass binds which summary record_refresh stores, so it stays type-safe per
panel. Each concrete model keeps only its experiment OneToOneField (and so its
related_name); no schema change.
@github-actions github-actions Bot added docs Documentation updates feature New feature or request and removed feature New feature or request docs Documentation updates labels Jun 16, 2026
- Rename the view-layer PanelT/_refresh_panel to ComputationT/
  _refresh_computation so the shared abstraction has one name everywhere
  (ExperimentComputation), rather than reintroducing UI "panel" vocabulary.
- _refresh_computation takes the finished 400 messages instead of a noun,
  dropping the unwritten assumption that the noun is a plural subject.
- _expected_variant_shares selects the current feature state by highest id
  (matching Environment's Max("id") convention) instead of relying on the
  default ascending ordering, which picked the oldest version; note the
  coupling to features' multivariate representation.
- Declare experiment on ExperimentComputation under TYPE_CHECKING so is_final
  is type-checked rather than suppressed with a type: ignore.
@github-actions github-actions Bot added docs Documentation updates feature New feature or request and removed feature New feature or request docs Documentation updates labels Jun 16, 2026
compute_experiment_exposures and compute_experiment_results had identical
bodies. Hoist the lifecycle (load experiment, skip if not started or final,
compute over the full window, preserve the last good payload on failure) into
_refresh_computation_row; each task becomes a thin adapter supplying its model,
summary function and failure logger. The helper is generic over SummaryT so
record_refresh stays checked against the payload type.

The failure log stays in the per-task adapter, not the helper, so its event
name remains a literal the docs generator can discover.
@gagantrivedi gagantrivedi force-pushed the feat/experiment-results-model branch from 85c5982 to 7db02c3 Compare June 17, 2026 06:42
@github-actions github-actions Bot added feature New feature or request docs Documentation updates and removed feature New feature or request docs Documentation updates labels Jun 17, 2026
@gagantrivedi gagantrivedi force-pushed the feat/experiment-results-model branch from 4f4876e to 94671bf Compare June 17, 2026 09:48
@github-actions github-actions Bot added docs Documentation updates feature New feature or request and removed feature New feature or request docs Documentation updates labels Jun 17, 2026
Inline the two refresh endpoints but extract the precondition checks into
_validate_refresh_request, which raises ValidationError (not started / final)
or Throttled (cooling down) and lets DRF render the response — including the
Retry-After header. Keeps the drift-prone validation in one place while each
endpoint's happy path stays inline, with no callbacks or generic TypeVar.
@gagantrivedi gagantrivedi force-pushed the feat/experiment-results-model branch from 94671bf to 694bf2c Compare June 17, 2026 09:57
@github-actions github-actions Bot added docs Documentation updates feature New feature or request and removed feature New feature or request docs Documentation updates labels Jun 17, 2026
Emit srm.unkeyed_variant before bailing out of _expected_variant_shares so the
skipped check is visible to operators. Trim the docstring to the useful part
and note the TODO to source the split from the percentage-split segment
override feature state once that exists.
@github-actions github-actions Bot added the docs Documentation updates label Jun 17, 2026
@github-actions github-actions Bot added feature New feature or request and removed feature New feature or request docs Documentation updates labels Jun 17, 2026
@github-actions github-actions Bot added the docs Documentation updates label Jun 17, 2026
Add is_final to ExperimentExposuresSerializer and ExperimentResultsSerializer
so the UI can decide whether to show the refresh button without re-deriving it
from the experiment's ended_at.
A misconfigured feature whose multivariate options over-allocate would give
control a negative expected share. Bail out of _expected_variant_shares with an
srm.overallocated log instead of relying on the downstream zero-share guard.
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Docker builds report

Image Build Status Security report
ghcr.io/flagsmith/flagsmith-e2e:pr-7796 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-api-test:pr-7796 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith:pr-7796 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-private-cloud:pr-7796 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-api:pr-7796 Finished ✅ Results

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Playwright Test Results (oss - depot-ubuntu-latest-16)

passed  1 passed

Details

stats  1 test across 1 suite
duration  41.5 seconds
commit  17e9032
info  🔄 Run: #17579 (attempt 1)

Playwright Test Results (oss - depot-ubuntu-latest-arm-16)

passed  1 passed

Details

stats  1 test across 1 suite
duration  37 seconds
commit  17e9032
info  🔄 Run: #17579 (attempt 1)

Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

failed  1 failed

Details

stats  1 test across 1 suite
duration  45 seconds
commit  17e9032
info  📦 Artifacts: View test results and HTML report
🔄 Run: #17579 (attempt 1)

Failed tests

firefox › tests/project-permission-test.pw.ts › Project Permission Tests › Project-level permissions control access to features, environments, audit logs, and segments @enterprise

### Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

failed  1 failed

Details

stats  1 test across 1 suite
duration  45.8 seconds
commit  17e9032
info  📦 Artifacts: View test results and HTML report
🔄 Run: #17579 (attempt 2)

Failed tests

firefox › tests/project-permission-test.pw.ts › Project Permission Tests › Project-level permissions control access to features, environments, audit logs, and segments @enterprise

### Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

failed  1 failed

Details

stats  1 test across 1 suite
duration  45.9 seconds
commit  17e9032
info  📦 Artifacts: View test results and HTML report
🔄 Run: #17579 (attempt 3)

Failed tests

firefox › tests/project-permission-test.pw.ts › Project Permission Tests › Project-level permissions control access to features, environments, audit logs, and segments @enterprise

### Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

failed  1 failed

Details

stats  1 test across 1 suite
duration  45.2 seconds
commit  17e9032
info  📦 Artifacts: View test results and HTML report
🔄 Run: #17579 (attempt 4)

Failed tests

firefox › tests/project-permission-test.pw.ts › Project Permission Tests › Project-level permissions control access to features, environments, audit logs, and segments @enterprise

### Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

failed  1 failed

Details

stats  1 test across 1 suite
duration  45.1 seconds
commit  17e9032
info  📦 Artifacts: View test results and HTML report
🔄 Run: #17579 (attempt 5)

Failed tests

firefox › tests/project-permission-test.pw.ts › Project Permission Tests › Project-level permissions control access to features, environments, audit logs, and segments @enterprise

@github-actions

Copy link
Copy Markdown
Contributor

Visual Regression

19 screenshots compared. See report for details.
View full report

@Zaimwa9 Zaimwa9 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Thanks. Wish we had the control as a MV option.
If there are some details to finetune i'll take it up from here while implementing the frontend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Issue related to the REST API feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants