fix: Manage OIDC admin password secret via cluster_resources by dervoeti · Pull Request #919 · stackabletech/nifi-operator

dervoeti · 2026-04-10T17:47:51Z

Description

The oidc-opa kuttl test fails consistently for all NiFi 2.x variants because the NiFi pod gets restarted shortly after becoming ready.

The OIDC admin password secret was created directly client.create(). This caused a problem:
The commons-op restarter mutating webhook could not see the secret when first computing annotations for the StatefulSet, producing incomplete restarter annotations. The restart controller then detected the missing annotation and patched the StatefulSet, triggering an unnecessary pod restart. In slow CI environments (AKS), the restarted pod took over 5 minutes to come back, exceeding the test's 300s timeout.
So the test proceeded, because the replica was shortly ready, but was then restarted by the restart controller. The restart took quite long in CI and exceeded the test timeout.

We now build the OIDC admin password secret with proper labels and owner references, and apply it through cluster_resources.add() like other managed resources, which solves the problem by preventing the unnecessary restart.

Definition of Done Checklist

Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant
Please make sure all these things are done and tick the boxes

Author

Changes are OpenShift compatible
CRD changes approved
CRD documentation for all fields, following the style guide.
Helm chart can be installed and deployed operator works
Integration tests passed (for non trivial changes)
Changes need to be "offline" compatible
Links to generated (nightly) docs added
Release note snippet added

Reviewer

Code contains useful comments
Code contains useful logging statements
(Integration-)Test cases added
Documentation added or updated. Follows the style guide.
Changelog updated
Cargo.toml only contains references to git tags (not specific commits or branches)

Acceptance

Feature Tracker has been updated
Proper release label has been added
Links to generated (nightly) docs added
Release note snippet added
Add type/deprecation label & add to the deprecation schedule
Add type/experimental label & add to the experimental features tracker

sbernauer · 2026-04-17T13:55:19Z

Thanks for the detailed report!
But I'm a bit confused, as the client.create() is before the cluster_resources.apply(), isn't it?
So aren't we creating it now later on with this PR?

The new code always reads in the secret to copy it and to write it out again, which looks a bit silly and actually causes many, many more Secret generations.
So actually doesn't the new code produce more generations in comparison to an ideal code, which only creates the Secret once (with generation 0)?

BTW, we added a shared function to op-rs in stackabletech/operator-rs#1187, which we could use for all this sort of use-cases. Either that is broken (than we should fix it) or it is already there.
@razvan I'm glad we added exactly that :)

I'm only on a train right now, but I would be interested in deeper understanding what exactly is the problem, as I fail to see how

The Secret is created after the StatefulSet
How the Secret ends up with a generation > 1

razvan · 2026-04-17T14:11:01Z

Without having seen the comment from @sbernauer I allowed myself a little refactoring 66a8d96

dervoeti · 2026-04-17T14:59:16Z

Thanks for the detailed report! But I'm a bit confused, as the client.create() is before the cluster_resources.apply(), isn't it? So aren't we creating it now later on with this PR?

The new code always reads in the secret to copy it and to write it out again, which looks a bit silly and actually causes many, many more Secret generations. So actually doesn't the new code produce more generations in comparison to an ideal code, which only creates the Secret once (with generation 0)?

BTW, we added a shared function to op-rs in stackabletech/operator-rs#1187, which we could use for all this sort of use-cases. Either that is broken (than we should fix it) or it is already there. @razvan I'm glad we added exactly that :)

I'm only on a train right now, but I would be interested in deeper understanding what exactly is the problem, as I fail to see how

The Secret is created after the StatefulSet

How the Secret ends up with a generation > 1

I'm leaving for vacation soon so didn't have time to dig into this deeper, just a few notes:

Server side apply on secrets should be idempotent (no new generation if it didn't change), but not ideal, yes
I'm not 100% sure about the details of the race condition tbh, I had the suspicion that commons-op was missing the secret when it was created via client.create and adding it to cluster_resources would be the proper way, I tested it a couple of times and at least in my tests it fixed the problem. This might need further debugging.
I only skimmed feat: Add helper function to create random Secrets operator-rs#1187 but it sounds very useful for this case, in general feel free to refactor / take over this PR
I believe the main problem for the test failure (restart controller kicking in unnecessarily and pod restart takes long in CI) is true, but there might be a better way to fix it properly

dervoeti self-assigned this Apr 10, 2026

fix: Manage OIDC admin password secret via cluster_resources

8a68412

dervoeti force-pushed the fix/test-wait-for-restarter-rollout branch from ff14bbd to 8a68412 Compare April 10, 2026 17:48

dervoeti added this to Stackable Engineering Apr 10, 2026

dervoeti moved this to Development: Waiting for Review in Stackable Engineering Apr 10, 2026

razvan self-requested a review April 17, 2026 12:32

razvan moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Apr 17, 2026

sbernauer self-requested a review April 17, 2026 13:49

refactor to also repair possible broken existing Secret objects

66a8d96

Merge branch 'main' into fix/test-wait-for-restarter-rollout

2eb6eb7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Manage OIDC admin password secret via cluster_resources#919

fix: Manage OIDC admin password secret via cluster_resources#919
dervoeti wants to merge 3 commits intomainfrom
fix/test-wait-for-restarter-rollout

dervoeti commented Apr 10, 2026

Uh oh!

sbernauer commented Apr 17, 2026

Uh oh!

razvan commented Apr 17, 2026

Uh oh!

dervoeti commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

dervoeti commented Apr 10, 2026

Description

Definition of Done Checklist

Author

Reviewer

Acceptance

Uh oh!

sbernauer commented Apr 17, 2026

Uh oh!

razvan commented Apr 17, 2026

Uh oh!

dervoeti commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants