-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: allow import/reimport to wait for deduplication to complete #15007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Changes from all commits
3b802f8
b609de1
33b688d
0088fcd
bd46a79
78bfc9b
3df4698
ad29422
a0f4f14
5158ee2
8cb7b38
506fbbc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| from django.db import migrations, models | ||
|
|
||
|
|
||
| class Migration(migrations.Migration): | ||
|
|
||
| dependencies = [ | ||
| ('dojo', '0268_release_authorization_to_pro'), | ||
| ] | ||
|
|
||
| operations = [ | ||
| migrations.AddField( | ||
| model_name='usercontactinfo', | ||
| name='deduplication_execution_mode', | ||
| field=models.CharField(blank=True, choices=[('async', 'Async (do not wait)'), ('async_wait', 'Async, wait for deduplication'), ('sync', 'Synchronous (block)')], help_text="Controls how import/reimport deduplication post-processing is executed. 'Async' dispatches it to the background and returns immediately (default). 'Async, wait for deduplication' dispatches to the background but waits for deduplication to finish before responding, so notifications and statistics reflect the deduplicated state. 'Synchronous' runs the import deduplication inline. Can be overridden per request. Independent of block_execution, which forces all async tasks (notifications, jira, ...) to the foreground.", max_length=20, null=True), | ||
| ), | ||
| ] | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| from django.db import migrations | ||
|
|
||
|
|
||
| def seed_deduplication_execution_mode(apps, schema_editor): | ||
| """ | ||
| Seed the new import deduplication execution mode from the legacy block_execution flag. | ||
|
|
||
| block_execution remains the global "run all async tasks in the foreground" switch; | ||
| users who had it enabled get the synchronous deduplication mode so import behavior is | ||
| unchanged for them. | ||
| """ | ||
| UserContactInfo = apps.get_model("dojo", "UserContactInfo") | ||
| UserContactInfo.objects.filter(block_execution=True).update(deduplication_execution_mode="sync") | ||
|
|
||
|
|
||
| def unseed_deduplication_execution_mode(apps, schema_editor): | ||
| """Reverse: clear the seeded synchronous mode.""" | ||
| UserContactInfo = apps.get_model("dojo", "UserContactInfo") | ||
| UserContactInfo.objects.filter(deduplication_execution_mode="sync").update(deduplication_execution_mode=None) | ||
|
|
||
|
|
||
| class Migration(migrations.Migration): | ||
|
|
||
| dependencies = [ | ||
| ('dojo', '0269_usercontactinfo_deduplication_execution_mode'), | ||
| ] | ||
|
|
||
| operations = [ | ||
| migrations.RunPython(seed_deduplication_execution_mode, unseed_deduplication_execution_mode), | ||
| ] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -456,7 +456,13 @@ def post_process_finding_save_internal(finding, dedupe_option=True, rules_option | |
| jira_services.push(finding.finding_group) | ||
|
|
||
|
|
||
| @app.task | ||
| # ignore_result=False so the 'async_wait' import execution mode can join on the | ||
| # dispatched batch via AsyncResult.get() even when CELERY_TASK_IGNORE_RESULT=True. | ||
| # NOTE: this override may not be strictly necessary — in the past .get()/await | ||
| # appears to have worked with the global CELERY_TASK_IGNORE_RESULT=True and a | ||
| # Redis broker. Needs verification against a real broker/worker setup; if join | ||
| # works without it, this override can be removed to avoid storing extra results. | ||
| @app.task(ignore_result=False) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Non-blocking: this stores a result row for every batch of every import, in all modes. The result is only consumed in Re: the NOTE's open question — I verified live that the join does work with this override against a real worker + Mitigation: leave the task default alone and request the result only when you'll await it, scoping it to the dispatch in |
||
| def post_process_findings_batch( | ||
| finding_ids, | ||
| *args, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking: migration number collision.
devalready ships0269_normalize_blank_finding_components, which also declaresdependencies = [('dojo', '0268_release_authorization_to_pro')]. With this PR there are two0269migrations on the same parent → Django reports "Conflicting migrations detected; multiple leaf nodes in the migration graph" onmigrate/makemigrations --check, and the merge fails (this is theconflicts-detectedlabel).Fix: renumber both files and re-chain them after the current leaf:
0269_usercontactinfo_deduplication_execution_mode→0270_..., withdependencies = [('dojo', '0269_normalize_blank_finding_components')]0270_seed_deduplication_execution_mode→0271_..., withdependencies = [('dojo', '0270_usercontactinfo_deduplication_execution_mode')]The seed migration itself is fine and idempotent.