Skip to content

refactor(seaweedfs): split seaweedfs-system into seaweedfs-db + seaweedfs-system#2601

Draft
myasnikovdaniil wants to merge 1 commit into
mainfrom
split-seaweedfs-db
Draft

refactor(seaweedfs): split seaweedfs-system into seaweedfs-db + seaweedfs-system#2601
myasnikovdaniil wants to merge 1 commit into
mainfrom
split-seaweedfs-db

Conversation

@myasnikovdaniil
Copy link
Copy Markdown
Contributor

@myasnikovdaniil myasnikovdaniil commented May 10, 2026

What this PR does

⚠️ Requires Flux 2.8.x (helm-controller v1.5.0+) to function correctly.
Cozystack currently pins Flux to 2.7.x, which bundles helm-controller v1.4.0
and does not support HelmRelease.spec.healthCheckExprs or
spec.waitStrategy.name: poller. Without those, the new seaweedfs-db HR
will flip Ready=True as soon as helm install applies the Cluster CR
(before postgres is actually serving), and the race this PR is designed to
fix will reproduce. A separate PR will bump cozystack's pinned Flux to
2.8.x — merge that one first
(or land both in the same release cut).

Splits the seaweedfs-system HelmRelease in two so the CNPG Cluster/seaweedfs-db
lives in its own HR (seaweedfs-db), and the application HR (seaweedfs-system)
dependsOn it. The new HR uses Flux v2 HelmRelease.spec.healthCheckExprs with a
CEL expression on Cluster.status.conditions[type=Ready] plus
waitStrategy.name: poller, so its Ready=True only flips after the postgres
primary is actually serving connections — not just after helm install applied
the Cluster CR.

This eliminates the seaweedfs-filer CrashLoopBackOff race on a fresh tenant
install. With Cilium kubeProxyReplacement: true, socket-LB returns EPERM
from connect(2) to ClusterIPs with no Ready endpoints. Pre-split, the filer
StatefulSet schedule concurrently with the CNPG bootstrap (~55–70 s of unavailable
postgres), each connect() failed EPERM, kubelet's exponential restart backoff
pushed past the e2e bats kubectl wait hr/seaweedfs-system --timeout=2m window,
and the Configure Tenant and wait for applications test failed.

Changes

  • New packages/system/seaweedfs-db/ — chart with only the Cluster/seaweedfs-db template.
  • packages/extra/seaweedfs/templates/seaweedfs-db.yaml (new) — emits the HR with
    waitStrategy: poller + healthCheckExprs for postgresql.cnpg.io/v1 Cluster.
  • packages/extra/seaweedfs/templates/seaweedfs.yaml — added dependsOn: [seaweedfs-db],
    lowered interval: 5m → 30s so the dependency check fires promptly, dropped the
    now-unused db.* pass-through.
  • packages/system/seaweedfs/templates/database.yaml — moved into seaweedfs-db.
  • packages/system/seaweedfs/values.yaml — dropped db: block (no longer consumed here).
  • packages/core/platform/sources/seaweedfs-application.yaml — registered the new component.
  • packages/core/platform/images/migrations/migrations/39 — adopts existing
    Cluster/seaweedfs-db resources into the new release on upgrade by rewriting
    meta.helm.sh/release-name and stamping helm.sh/resource-policy: keep.

Same pattern for monitoring-system (grafana-db, alerta-db) is filed as a follow-up.

Release note

refactor(seaweedfs): split the per-tenant `seaweedfs-system` HelmRelease into a `seaweedfs-db` (CNPG Cluster) HR and the application HR, gated by Flux `healthCheckExprs` on the Cluster's `Ready` condition. Fixes a fresh-install race where seaweedfs-filer pods crashlooped against the still-bootstrapping postgres primary. Requires Flux 2.8.x (helm-controller v1.5.0+).

@github-actions github-actions Bot added area/storage Issues or PRs related to storage (linstor, seaweedfs, bucket, velero, harbor) kind/cleanup Categorizes issue or PR as related to cleanup of code, process, or technical debt size/XXL This PR changes 1000+ lines, ignoring generated files labels May 10, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a race condition during fresh tenant installations where seaweedfs-filer pods would crashloop due to the database not being ready. By decoupling the database infrastructure from the application components and utilizing Flux's dependency management and health check expressions, we ensure a reliable deployment order. This change requires Flux 2.8.x and includes a migration path for existing deployments.

Highlights

  • Architecture Refactor: Split the seaweedfs-system HelmRelease into two separate releases: seaweedfs-db (for the CNPG Cluster) and seaweedfs-system (for the application).
  • Race Condition Fix: Implemented Flux HelmRelease healthCheckExprs and waitStrategy: poller to ensure the application only deploys after the database is fully ready, preventing EPERM crashloops.
  • Migration Strategy: Added a migration script (migration 39) to re-annotate existing CNPG Cluster resources, allowing the new seaweedfs-db release to adopt them without disruption.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 10, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 193d4e16-34c1-424e-ab95-9f276e31bd9b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request splits the SeaweedFS system into two coordinated HelmReleases to eliminate a startup race condition. The database release (seaweedfs-db) is now provisioned first with CNPG readiness gating via healthCheckExprs, and the workload release (seaweedfs-system) depends on successful database deployment. A migration script handles existing deployments.

Changes

SeaweedFS System Split

Layer / File(s) Summary
Design & Planning
docs/superpowers/plans/2026-05-10-split-seaweedfs-system.md, docs/superpowers/specs/2026-05-10-split-seaweedfs-system-design.md
Comprehensive plan and design spec detailing the EPERM crash problem, architecture with Flux dependsOn and CNPG healthCheck gating, step-by-step implementation tasks, migration strategy, error handling, and verification checkpoints.
New Chart Definition
packages/system/seaweedfs-db/Chart.yaml, packages/system/seaweedfs-db/Makefile, packages/system/seaweedfs-db/README.md, packages/system/seaweedfs-db/values.schema.json, packages/system/seaweedfs-db/values.yaml
New seaweedfs-db Helm chart with API v2 metadata, build scaffolding, database-specific values schema (replicas, size, storageClass), and configuration documentation.
Source Registration
packages/core/platform/sources/seaweedfs-application.yaml
Registers seaweedfs-db component pointing to system/seaweedfs-db in the PackageSource variants, enabling chart discovery and artifact packaging.
HelmRelease Wiring
packages/extra/seaweedfs/templates/seaweedfs-db.yaml, packages/extra/seaweedfs/templates/seaweedfs.yaml, packages/system/seaweedfs/values.yaml
New seaweedfs-db.yaml template with poller waitStrategy and CEL healthCheckExprs gating on CNPG Ready condition; updates seaweedfs.yaml with dependsOn on seaweedfs-db, reduces reconciliation interval to 30s, removes db values pass-through; deletes db config from seaweedfs values.yaml.
Migration for Adoption
packages/core/platform/images/migrations/migrations/39
Shell migration script that iterates namespaces, re-annotates existing Cluster/seaweedfs-db from seaweedfs-system to seaweedfs-db ownership, sets helm.sh/resource-policy=keep, and updates cozystack-version ConfigMap to v39.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A database splits from the system so fine,
With readiness gates in a poller design,
Flux waits for the CNPG to say it's all set,
Before filer pods start—no EPERM regret!
Two HelmReleases dance in dependency's line.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'refactor(seaweedfs): split seaweedfs-system into seaweedfs-db + seaweedfs-system' accurately and concisely summarizes the primary change—splitting a single HelmRelease into two separate releases—which is the core objective of the PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch split-seaweedfs-db

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dosubot dosubot Bot added the area/platform Issues or PRs related to platform infrastructure (bundle, flux, talos, installer) label May 10, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request splits the seaweedfs-system HelmRelease into seaweedfs-db and seaweedfs-system to ensure the database is ready before the application starts, resolving an EPERM crashloop issue. The changes introduce a new system chart, a migration script for existing clusters, and utilize FluxCD's healthCheckExprs for readiness gating. Feedback identifies a shell compatibility issue in the migration script, missing resource fields in the database values schema, and the presence of local absolute paths in the implementation plan.

@@ -0,0 +1,32 @@
#!/bin/sh
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The script uses set -o pipefail (on line 13), which is a bash-specific option and not supported by POSIX-compliant shells like dash (the default /bin/sh on many systems). To ensure the script executes correctly with this option, the shebang should be changed to #!/bin/bash.

Suggested change
#!/bin/sh
#!/bin/bash

Comment on lines +9 to +12
## @typedef {struct} DB - Database configuration.
## @field {int} [replicas] - Number of database replicas.
## @field {quantity} [size] - Persistent Volume size.
## @field {string} [storageClass] - StorageClass used to store the data.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The DB typedef is missing the resources and resourcesPreset fields. These are being passed by the wrapper template in packages/extra/seaweedfs/templates/seaweedfs-db.yaml. Adding them here ensures they are included in the generated JSON schema and documentation. Note that the database.yaml template in this chart also needs to be updated to use these values instead of hardcoded ones.

## @typedef {struct} DB - Database configuration.
## @field {int} [replicas] - Number of database replicas.
## @field {quantity} [size] - Persistent Volume size.
## @field {string} [storageClass] - StorageClass used to store the data.
## @field {string} [resourcesPreset] - Resource preset.
## @field {Resources} [resources] - Resource configuration.

Comment on lines +15 to +18
db:
replicas: 2
size: 10Gi
storageClass: ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Include default values for resourcesPreset and resources in the db block to align with the updated typedef and provide defaults for the schema.

db:
  replicas: 2
  size: 10Gi
  storageClass: ""
  resourcesPreset: "nanosmall"
  resources: {}


- [ ] **Step 3: Verify directory structure**

Run: `ls /home/daniil/aenix/cozystack-split-seaweedfs/packages/system/seaweedfs-db/`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

This document contains absolute paths specific to the author's local environment (e.g., /home/daniil/aenix/cozystack-split-seaweedfs/). These should be replaced with relative paths or placeholders to ensure the documentation is portable and clean. This applies to multiple locations throughout the file.

Suggested change
Run: `ls /home/daniil/aenix/cozystack-split-seaweedfs/packages/system/seaweedfs-db/`
Run: ls packages/system/seaweedfs-db/

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
packages/system/seaweedfs-db/Chart.yaml (1)

1-3: 💤 Low value

Consider adding a description field.

While not strictly required, adding a description field to the Chart.yaml improves discoverability and follows Helm best practices. The sibling charts in this repository may provide examples.

📝 Suggested addition
 apiVersion: v2
 name: cozy-seaweedfs-db
+description: SeaweedFS Database (CNPG Cluster) for Cozystack
 version: 0.0.0 # Placeholder, the actual version will be automatically set during the build process
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/system/seaweedfs-db/Chart.yaml` around lines 1 - 3, Add a
descriptive "description" field to the Helm chart metadata in Chart.yaml for
cozy-seaweedfs-db: update the Chart.yaml (the top-level apiVersion/name/version
block) to include a concise human-readable description string explaining the
chart purpose (e.g., what cozy-seaweedfs-db deploys and any notable behavior),
matching style of sibling charts so discoverability and Helm best practices are
followed.
docs/superpowers/specs/2026-05-10-split-seaweedfs-system-design.md (1)

7-9: 💤 Low value

Add language specifier to fenced code block.

The code block showing the error message should specify a language for proper syntax highlighting and to satisfy the markdown linter.

📝 Suggested fix
-```
+```text
 dial tcp <ClusterIP>:5432 (seaweedfs-db-rw): connect: operation not permitted

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @docs/superpowers/specs/2026-05-10-split-seaweedfs-system-design.md around
lines 7 - 9, The fenced code block containing the error message in the document
should include a language specifier for proper highlighting and linting; update
the fenced block around the line with the text "dial tcp :5432
(seaweedfs-db-rw): connect: operation not permitted" (the existing
triple-backtick block) to use a language tag such as text (i.e., change ``` to

docs/superpowers/plans/2026-05-10-split-seaweedfs-system.md (1)

169-175: 💤 Low value

Add language specifier to fenced code block.

For consistency with the rest of the plan and to satisfy the markdown linter, add a language specifier to this code block.

📝 Suggested fix
-```
+```yaml
 ---
 apiVersion: postgresql.cnpg.io/v1
 kind: Cluster
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/plans/2026-05-10-split-seaweedfs-system.md` around lines 169
- 175, The fenced code block containing the Kubernetes manifest (starting with
"apiVersion: postgresql.cnpg.io/v1", "kind: Cluster", and "metadata: name:
seaweedfs-db") needs a language specifier; update the opening fence from ``` to
```yaml so the block becomes a YAML code block to satisfy the markdown linter
and maintain consistency with other plan examples.
packages/system/seaweedfs-db/values.yaml (1)

5-7: 💤 Low value

Unused typedef: Resources is documented but not referenced.

The Resources typedef defines cpu and memory fields, but these are not used anywhere in the db configuration below. This may be a copy-paste artifact from another values file.

♻️ Suggested cleanup

If the Resources typedef is not needed:

 ##
 ## `@section` Database parameters
 ##
 
-## `@typedef` {struct} Resources - Resource configuration.
-## `@field` {quantity} [cpu] - Number of CPU cores allocated.
-## `@field` {quantity} [memory] - Amount of memory allocated.
-
 ## `@typedef` {struct} DB - Database configuration.
 ## `@field` {int} [replicas] - Number of database replicas.
 ## `@field` {quantity} [size] - Persistent Volume size.
 ## `@field` {string} [storageClass] - StorageClass used to store the data.

However, if resources will be added to the db configuration in the future, keeping the typedef is fine.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/system/seaweedfs-db/values.yaml` around lines 5 - 7, The typedef
"Resources" (struct Resources with cpu and memory) is declared but not
referenced by the db configuration; either remove this unused typedef block or
wire it into the db values by adding a resources entry that consumes cpu and
memory (e.g., ensure the db chart's values structure includes a resources object
and any templates reference values.db.resources.cpu/memory or similar). Locate
the "Resources" typedef and either delete it or add a values key (e.g.,
db.resources) and update relevant templates/deployment specs to read
values.db.resources.cpu and values.db.resources.memory so the fields are
actually used.
packages/extra/seaweedfs/templates/seaweedfs.yaml (1)

111-113: ⚡ Quick win

Document the rationale for the 10× faster reconciliation interval.

The interval was reduced from 5m to 30s, which increases reconciliation frequency by 10×. This could impact cluster load, especially with many tenants. Please clarify whether this aggressive interval is necessary for quick recovery after the DB becomes ready, or if a more moderate value (e.g., 1m or 2m) would suffice.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/extra/seaweedfs/templates/seaweedfs.yaml` around lines 111 - 113,
Add a short in-file comment above the interval setting that explains why the
reconciliation interval for the SeaweedFS resource was reduced from 5m to 30s
(or change the value) — reference the interval key and the dependsOn entry ({{
.Release.Name }}-db) and state whether the 30s is required for fast recovery
after the DB becomes ready or if a moderate value like 1m/2m is preferred to
reduce load; update the interval to the chosen moderate value if you decide not
to keep 30s and ensure the rationale comment documents the trade-offs for
cluster load and tenant scale.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/superpowers/specs/2026-05-10-split-seaweedfs-system-design.md`:
- Around line 68-82: This PR adds HelmRelease.spec.healthCheckExprs (see
HelmRelease.spec.healthCheckExprs in
packages/extra/seaweedfs/templates/seaweedfs-db.yaml) which requires
helm-controller v1.5.0+/Flux 2.8.x; because the repo is pinned to Flux 2.7.x
(helm-controller v1.4.3) you must not merge this change until the Flux upgrade
lands—either (a) remove or revert the healthCheckExprs usage and instead use a
supported wait strategy/timeout fallback, or (b) gate this PR on the Flux
upgrade by confirming the Flux 2.8.x upgrade PR exists and will merge first and
add a note in this PR linking that upgrade and/or add CI gating that blocks
merge until the upgrade is merged.

In `@packages/extra/seaweedfs/templates/seaweedfs-db.yaml`:
- Around line 27-31: Add a comment immediately above the healthCheckExprs block
documenting that this field requires Flux v2.8.x (helm-controller v1.5.0+) and
that on older Flux (e.g., v2.7.x) the HelmRelease will either fail to parse or
silently ignore healthCheckExprs, removing the readiness check; update the
comment to mention the minimum Flux and helm-controller versions and the
resulting behavior so maintainers know to upgrade Flux before relying on the
healthCheckExprs in this HelmRelease.

---

Nitpick comments:
In `@docs/superpowers/plans/2026-05-10-split-seaweedfs-system.md`:
- Around line 169-175: The fenced code block containing the Kubernetes manifest
(starting with "apiVersion: postgresql.cnpg.io/v1", "kind: Cluster", and
"metadata: name: seaweedfs-db") needs a language specifier; update the opening
fence from ``` to ```yaml so the block becomes a YAML code block to satisfy the
markdown linter and maintain consistency with other plan examples.

In `@docs/superpowers/specs/2026-05-10-split-seaweedfs-system-design.md`:
- Around line 7-9: The fenced code block containing the error message in the
document should include a language specifier for proper highlighting and
linting; update the fenced block around the line with the text "dial tcp
<ClusterIP>:5432 (seaweedfs-db-rw): connect: operation not permitted" (the
existing triple-backtick block) to use a language tag such as text (i.e., change
``` to ```text) so the markdown linter and renderers treat it as plain text.

In `@packages/extra/seaweedfs/templates/seaweedfs.yaml`:
- Around line 111-113: Add a short in-file comment above the interval setting
that explains why the reconciliation interval for the SeaweedFS resource was
reduced from 5m to 30s (or change the value) — reference the interval key and
the dependsOn entry ({{ .Release.Name }}-db) and state whether the 30s is
required for fast recovery after the DB becomes ready or if a moderate value
like 1m/2m is preferred to reduce load; update the interval to the chosen
moderate value if you decide not to keep 30s and ensure the rationale comment
documents the trade-offs for cluster load and tenant scale.

In `@packages/system/seaweedfs-db/Chart.yaml`:
- Around line 1-3: Add a descriptive "description" field to the Helm chart
metadata in Chart.yaml for cozy-seaweedfs-db: update the Chart.yaml (the
top-level apiVersion/name/version block) to include a concise human-readable
description string explaining the chart purpose (e.g., what cozy-seaweedfs-db
deploys and any notable behavior), matching style of sibling charts so
discoverability and Helm best practices are followed.

In `@packages/system/seaweedfs-db/values.yaml`:
- Around line 5-7: The typedef "Resources" (struct Resources with cpu and
memory) is declared but not referenced by the db configuration; either remove
this unused typedef block or wire it into the db values by adding a resources
entry that consumes cpu and memory (e.g., ensure the db chart's values structure
includes a resources object and any templates reference
values.db.resources.cpu/memory or similar). Locate the "Resources" typedef and
either delete it or add a values key (e.g., db.resources) and update relevant
templates/deployment specs to read values.db.resources.cpu and
values.db.resources.memory so the fields are actually used.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d9aed729-becb-4727-a3d1-b0a9ea155b00

📥 Commits

Reviewing files that changed from the base of the PR and between 99e7070 and 28b02d5.

📒 Files selected for processing (13)
  • docs/superpowers/plans/2026-05-10-split-seaweedfs-system.md
  • docs/superpowers/specs/2026-05-10-split-seaweedfs-system-design.md
  • packages/core/platform/images/migrations/migrations/39
  • packages/core/platform/sources/seaweedfs-application.yaml
  • packages/extra/seaweedfs/templates/seaweedfs-db.yaml
  • packages/extra/seaweedfs/templates/seaweedfs.yaml
  • packages/system/seaweedfs-db/Chart.yaml
  • packages/system/seaweedfs-db/Makefile
  • packages/system/seaweedfs-db/README.md
  • packages/system/seaweedfs-db/templates/database.yaml
  • packages/system/seaweedfs-db/values.schema.json
  • packages/system/seaweedfs-db/values.yaml
  • packages/system/seaweedfs/values.yaml
💤 Files with no reviewable changes (1)
  • packages/system/seaweedfs/values.yaml

Comment on lines +68 to +82
```yaml
spec:
waitStrategy:
name: poller # required for healthCheckExprs to be evaluated
healthCheckExprs:
- apiVersion: postgresql.cnpg.io/v1
kind: Cluster
current: has(status.conditions) && status.conditions.exists(e, e.type == 'Ready' && e.status == 'True')
failed: has(status.conditions) && status.conditions.exists(e, e.type == 'Ready' && e.status == 'False')
```

`exists` requires at least one matching condition element, so a missing `status.conditions` evaluates to `false` and the wait keeps polling — important early in the lifecycle when the CR has been applied but CNPG hasn't yet stamped any conditions.

The expression matches every `postgresql.cnpg.io/v1 Cluster` resource managed by the release (only `seaweedfs-db` here). The HR's `wait` phase blocks until `current` returns `true`, which only happens once CNPG sets `Ready=True` on the Cluster. Helm's existing `timeout: 10m` is the outer ceiling — if CNPG never reaches Ready, the HR fails with a clear message instead of silently letting the dependent HR proceed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check current Flux version in the repository
echo "Searching for Flux version references..."
rg -n "flux.*version|fluxcd.*v2\." --type=yaml -C2 -g '!docs/**'

Repository: cozystack/cozystack

Length of output: 523


🏁 Script executed:

# Search for Flux version pins more broadly
find . -type f \( -name "*.yaml" -o -name "*.yml" -o -name "*.md" -o -name "*.json" \) ! -path "*/.*" | head -50 | xargs grep -l "flux.*2\." 2>/dev/null | head -20

Repository: cozystack/cozystack

Length of output: 45


🏁 Script executed:

# Look for version information in common locations
echo "=== Checking for FluxCD version in various config files ==="
for file in fluxcd.yaml flux-release.yaml Chart.yaml kustomization.yaml values.yaml release.yaml; do
  find . -name "$file" -type f ! -path "*/.*" -exec grep -H "version" {} \; 2>/dev/null
done

Repository: cozystack/cozystack

Length of output: 50375


🏁 Script executed:

# Search the docs directory for healthCheckExprs references
echo "=== Searching for healthCheckExprs feature documentation ==="
rg -n "healthCheckExprs" --type=yaml --type=markdown

Repository: cozystack/cozystack

Length of output: 6096


🏁 Script executed:

# Check for Flux version in the main README or installation files
echo "=== Checking README and main docs ==="
rg -n "flux.*2\.[0-9]" --type=markdown -C 2 docs/ 2>/dev/null | head -40

Repository: cozystack/cozystack

Length of output: 701


🏁 Script executed:

# Look at the actual seaweedfs-db template to see current implementation
head -50 ./packages/extra/seaweedfs/templates/seaweedfs-db.yaml

Repository: cozystack/cozystack

Length of output: 1451


🏁 Script executed:

# Search for any version requirements or comments about healthCheckExprs
rg -i "healthcheckexprs|helm.*controller|v1.5.0" --type=yaml --type=markdown -B 2 -A 2

Repository: cozystack/cozystack

Length of output: 50377


🏁 Script executed:

# Check Flux release notes or documentation in the repo
find . -type f -name "*.md" ! -path "*/.*" -exec rg -l "helm.controller|healthCheckExprs.*version|2\.8" {} \;

Repository: cozystack/cozystack

Length of output: 407


🌐 Web query:

Flux 2.8.0 helm-controller v1.5.0 healthCheckExprs feature release

💡 Result:

Flux 2.8.0 (feature release) shipped with helm-controller v1.5.0 on 2026-02-24, and the helm-controller v1.5.0 release introduced the HelmRelease.spec.healthCheckExprs feature (custom health checks via CEL expressions) [1][2]. What changed (healthCheckExprs / CEL health checks): - helm-controller v1.5.0 added support for custom health checks on HelmRelease using CEL expressions via the.spec.healthCheckExprs field [2][3]. - The helm-controller API docs specify that.spec.healthCheckExprs evaluates CEL expressions for custom resources (with per-entry apiVersion/kind, and optional inProgress/failed and required current) and that evaluation order is inProgress → failed → current; the first expression that returns true determines the custom resource’s health status [4]. - The feature is tied to the Helm v4 “kstatus” based health checking flow: the GitHub issue notes it is only available for kstatus health checks implemented in Helm 4 (i.e., the waiter WaitStrategy), and the legacy WaitStrategy won’t be able to use it [5]. Release timing / versions: - Flux v2.8.0 release tag is dated 2026-02-24 and explicitly lists helm-controller v1.5.0 as part of the Flux 2.8.0 component versions [1]. - helm-controller v1.5.0 changelog shows release date 2026-02-20 and documents the defaults/behavior changes plus the addition of.spec.healthCheckExprs support [2].

Citations:


🏁 Script executed:

# Check for any recent Flux version changes or upgrade PRs
rg -n "flux.*2\.8|helm.*controller.*v1\.5" --type=markdown --type=yaml -C 3 | head -60

Repository: cozystack/cozystack

Length of output: 45


🏁 Script executed:

# Look for any issue or PR references about Flux upgrades
find . -type f \( -name "*.md" -o -name "*.yaml" \) ! -path "*/.*" -exec grep -l "flux.*upgrade\|2\.8\.0\|upgrade.*flux" {} \; 2>/dev/null

Repository: cozystack/cozystack

Length of output: 232


🏁 Script executed:

# Check the git history for recent Flux-related changes
git log --oneline --all -30 2>/dev/null | grep -i "flux\|helm.controller"

Repository: cozystack/cozystack

Length of output: 45


🏁 Script executed:

# Check the v1.2.0 changelog for Flux upgrade information
grep -A 5 -B 5 "flux\|2\.8" docs/changelogs/v1.2.0.md | head -40

Repository: cozystack/cozystack

Length of output: 3265


🏁 Script executed:

# Check if there are any recent version bumps in values
rg "version.*2\.[78]" packages/system/fluxcd/ packages/core/flux-aio/ -A 2 -B 2

Repository: cozystack/cozystack

Length of output: 343


🏁 Script executed:

# Look at the fluxcd values to see what version is specified
cat packages/system/fluxcd/values.yaml | grep -A 10 -B 10 "version"

Repository: cozystack/cozystack

Length of output: 522


Block merge until Flux is upgraded to 2.8.x.

The PR uses HelmRelease.spec.healthCheckExprs (in packages/extra/seaweedfs/templates/seaweedfs-db.yaml, lines 27–30) which requires helm-controller v1.5.0+, shipped with Flux 2.8.0. The repository is currently pinned to Flux 2.7.x with helm-controller v1.4.3, which does not support this feature. Verify that a Flux upgrade PR exists and will merge first, or gate this PR on the Flux upgrade completing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/specs/2026-05-10-split-seaweedfs-system-design.md` around
lines 68 - 82, This PR adds HelmRelease.spec.healthCheckExprs (see
HelmRelease.spec.healthCheckExprs in
packages/extra/seaweedfs/templates/seaweedfs-db.yaml) which requires
helm-controller v1.5.0+/Flux 2.8.x; because the repo is pinned to Flux 2.7.x
(helm-controller v1.4.3) you must not merge this change until the Flux upgrade
lands—either (a) remove or revert the healthCheckExprs usage and instead use a
supported wait strategy/timeout fallback, or (b) gate this PR on the Flux
upgrade by confirming the Flux 2.8.x upgrade PR exists and will merge first and
add a note in this PR linking that upgrade and/or add CI gating that blocks
merge until the upgrade is merged.

Comment on lines +27 to +31
healthCheckExprs:
- apiVersion: postgresql.cnpg.io/v1
kind: Cluster
current: has(status.conditions) && status.conditions.exists(e, e.type == 'Ready' && e.status == 'True')
failed: has(status.conditions) && status.conditions.exists(e, e.type == 'Ready' && e.status == 'False')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check deployed Flux version for helm-controller

kubectl get deployment -n flux-system helm-controller -o jsonpath='{.spec.template.spec.containers[0].image}' 2>/dev/null || echo "Flux not found"

Repository: cozystack/cozystack

Length of output: 78


🏁 Script executed:

# First, let's examine the seaweedfs-db.yaml file
cat -n packages/extra/seaweedfs/templates/seaweedfs-db.yaml

Repository: cozystack/cozystack

Length of output: 1752


🏁 Script executed:

# Search for any existing documentation or validation about Flux version in the seaweedfs package
rg -i "flux|helm-controller|version|2\.8" packages/extra/seaweedfs/ -A 2 -B 2

Repository: cozystack/cozystack

Length of output: 38057


🏁 Script executed:

# Check if there are any migration or bootstrap scripts that validate Flux version
find packages/extra/seaweedfs -type f \( -name "*.yaml" -o -name "*.sh" -o -name "*.md" \) -exec grep -l "flux\|Flux\|validation\|preflight" {} \;

Repository: cozystack/cozystack

Length of output: 166


🏁 Script executed:

# Verify the file structure to ensure umbrella pattern compliance
tree -L 3 packages/extra/seaweedfs/ 2>/dev/null || find packages/extra/seaweedfs -type f -o -type d | head -20

Repository: cozystack/cozystack

Length of output: 695


🌐 Web query:

Flux helm-controller healthCheckExprs version requirement 2.8.0

💡 Result:

Flux’s HelmRelease .spec.healthCheckExprs (CEL-based custom health checks) requires helm-controller v1.5.0+; Flux 2.8.0 uses helm-controller v1.5.0 and introduces this support, so it is available starting with Flux 2.8.0 and any helm-controller >=1.5.0.[1][2] What you asked (“Flux helm-controller healthCheckExprs version requirement 2.8.0”): if your Flux version is 2.8.0 (released with helm-controller v1.5.0), you can use .spec.healthCheckExprs on HelmRelease resources.[1][3] Notes/constraints from the official docs: - .spec.healthCheckExprs is evaluated only when Helm wait is enabled (i.e., .spec.disableWait is false) and when the poller wait strategy is used (.spec.waitStrategy.name is poller).[3] - It is implemented via CEL expressions with current (required) and optional inProgress/failed.[3] - The implementation is only available for the kstatus-based health checks in Helm 4 (i.e., the poller wait strategy); users using the legacy wait strategy can’t use the feature.[4] Examples and field semantics are documented here: the HelmRelease “Health check expressions” section.[3]

Citations:


🌐 Web query:

Flux HelmRelease healthCheckExprs supported version helm-controller

💡 Result:

Flux HelmRelease supports custom health check expressions via the .spec.healthCheckExprs field (CEL expressions), and this feature is tied to the Helm 4 + waitStrategy: poller flow using kstatus readers. [1][2] Supported Helm-controller version (where this feature landed): fluxcd/helm-controller v1.5.0 (released 2026-02-20), which includes “Add support for custom health checks via CEL expressions” and notes that “custom health checks via CEL expressions” are supported via .spec.healthCheckExprs. [2] Implementation/usage notes (to ensure the expressions actually evaluate): - .spec.healthCheckExprs is evaluated only when Helm action wait is enabled (.spec.disableWait is false) and when the poller wait strategy is used (.spec.waitStrategy.name is poller). [1] - The controller evaluates inProgress first, then failed, then current; the first expression that evaluates to true determines health status. [1]

Citations:


Document the minimum Flux version requirement for healthCheckExprs.

The healthCheckExprs field on line 27 requires Flux 2.8.x (helm-controller v1.5.0+). If Flux is still on 2.7.x, the HelmRelease will either fail to parse or silently ignore the healthCheckExprs field, defeating the purpose of this readiness check.

Add a comment documenting this dependency:

Proposed documentation addition
+  # REQUIRES: Flux 2.8.x+ (helm-controller v1.5.0+) for healthCheckExprs support
   # `poller` waitStrategy is required for healthCheckExprs to be evaluated.
   # Without it, the HR flips Ready as soon as helm install applies the Cluster CR,
   # before CNPG has bootstrapped postgres.
   waitStrategy:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/extra/seaweedfs/templates/seaweedfs-db.yaml` around lines 27 - 31,
Add a comment immediately above the healthCheckExprs block documenting that this
field requires Flux v2.8.x (helm-controller v1.5.0+) and that on older Flux
(e.g., v2.7.x) the HelmRelease will either fail to parse or silently ignore
healthCheckExprs, removing the readiness check; update the comment to mention
the minimum Flux and helm-controller versions and the resulting behavior so
maintainers know to upgrade Flux before relying on the healthCheckExprs in this
HelmRelease.

@myasnikovdaniil myasnikovdaniil marked this pull request as draft May 12, 2026 03:45
myasnikovdaniil added a commit that referenced this pull request May 12, 2026
…collision

Main has a conditional dependsOn:[ingress-nginx-system] in the application
HR (added after #2601 branched). PR #2601 added an unconditional
dependsOn:[seaweedfs-db]. The merge produced two `dependsOn:` keys in the
same HelmRelease spec, which the Helm post-render rejected:

  Helm install failed for release tenant-root/seaweedfs ...:
  error while running post render on files: map[string]interface {}(nil):
  yaml: unmarshal errors

Collapse into one list: the db dependency is unconditional; the ingress
dependency stays inside the existing `if eq $ingress .Release.Namespace`
guard so sub-tenants that inherit ingress from a parent namespace don't
deadlock on a non-existent local HR.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Myasnikov Daniil <myasnikovdaniil2001@gmail.com>
…edfs-system

Splits the seaweedfs-system HelmRelease in two so the CNPG
Cluster/seaweedfs-db lives in its own HR (seaweedfs-db), and the
application HR (seaweedfs-system) dependsOn it. The new HR uses Flux v2
HelmRelease.spec.healthCheckExprs with a CEL expression on
Cluster.status.conditions[type=Ready] plus waitStrategy.name: poller, so
its Ready=True only flips after the postgres primary is actually serving
connections — not just after helm install applied the Cluster CR.

This eliminates the seaweedfs-filer CrashLoopBackOff race on a fresh
tenant install. With Cilium kubeProxyReplacement: true, socket-LB
returns EPERM from connect(2) to ClusterIPs with no Ready endpoints.
Pre-split, the filer StatefulSet scheduled concurrently with the CNPG
bootstrap (~55–70s of unavailable postgres), each connect() failed
EPERM, kubelet exponential restart backoff pushed past the e2e bats
'kubectl wait hr/seaweedfs-system --timeout=2m' window, and the
'Configure Tenant and wait for applications' test failed.

Migration 39 adopts existing Cluster/seaweedfs-db resources into the new
release on upgrade by rewriting meta.helm.sh/release-name and stamping
helm.sh/resource-policy: keep.

Requires Flux 2.8.x (helm-controller v1.5.0+). Cozystack currently pins
Flux to 2.7.x; a separate PR bumps the pin.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Myasnikov Daniil <myasnikovdaniil2001@gmail.com>
@github-actions github-actions Bot added size/L This PR changes 100-499 lines, ignoring generated files and removed size/XXL This PR changes 1000+ lines, ignoring generated files labels May 13, 2026
myasnikovdaniil added a commit that referenced this pull request May 13, 2026
…collision

Main has a conditional dependsOn:[ingress-nginx-system] in the application
HR (added after #2601 branched). PR #2601 added an unconditional
dependsOn:[seaweedfs-db]. The merge produced two `dependsOn:` keys in the
same HelmRelease spec, which the Helm post-render rejected:

  Helm install failed for release tenant-root/seaweedfs ...:
  error while running post render on files: map[string]interface {}(nil):
  yaml: unmarshal errors

Collapse into one list: the db dependency is unconditional; the ingress
dependency stays inside the existing `if eq $ingress .Release.Namespace`
guard so sub-tenants that inherit ingress from a parent namespace don't
deadlock on a non-existent local HR.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Myasnikov Daniil <myasnikovdaniil2001@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/platform Issues or PRs related to platform infrastructure (bundle, flux, talos, installer) area/storage Issues or PRs related to storage (linstor, seaweedfs, bucket, velero, harbor) kind/cleanup Categorizes issue or PR as related to cleanup of code, process, or technical debt size/L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant