Skip to content

Helm Operator fails when two different types of Custom Resources have same name #3357

Description

@pre

Bug Report

Helm operator writes Helm specific metadata of the deployed Custom Resource (instance of a Chart) in a Kubernetes Secret.
The name of the Secret does not contain the name of the Custom Resource Definition but it contains the name of the Custom Resource. As a result, one instance of CRD_A and one instance of CRD_B will corrupt each other's K8s Secret if CR_A and CR_B both have the same name.

For example:

  • Cluster has two Custom Resource Definitions (CRD): Lolcat and Doge.
  • Each CRD has a Helm Operator watching it.
  • When Lolcat named control is deployed, its Helm Operator will create a Kubernetes Secret named sh.helm.release.v1.control.v1
  • When Doge named control is deployed, its Helm Operator will create a Kubernetes Secret using the same name sh.helm.release.v1.control.v1

This secret contains Helm's own internal metadata. Since the metadata in sh.helm.release.v1.control.v1 was about Lolcat, the change set for Doge applied later will cause Helm Operator to fail.

As a corollary, the name of any Custom Resource backed by a Helm Opeator must be unique in a given namespace.

While it is possible to have two independent and de-coupled Custom Resource Definitions (Lolcat and Doge), any instance of a Custom Resource backed by the Helm Operator must have a unique name in that namespace. This is surprising and not obvious to debug when you first see the error message for the first time.

Even if this could not be fixed due to Helm internals, it'd save a significant amount of cumulative debugging time if the Helm Operator would give a sensible error message. The current error message is about a Helm metadata about the wrong Custom Resource instance.

In the snipped below an instance of Doge named control fails, because an instance of Lolcat named control has been deployed earlier. As you can see, the name of the secret is sh.helm.release.v1.control.v6 since the Lolcat was already in v5. However, it should have been v1 since this was the first deployment of Doge.

{"level":"error","ts":1594219438.7016752,"logger":"helm.controller","msg":"Release failed","namespace":"dev","name":"control","apiVersion":"rdx.net/v1alpha1","kind":"AccessConfig","release":"control","error":"failed update (update: failed to update: secrets \"sh.helm.release.v1.control.v6\" not found) and failed rollback: release: not found","stacktrace":[..] }

{"level":"error","ts":1594219438.7058475,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"accessconfig-controller","request":"dev/control","error":"failed update (update: failed to update: secrets \"sh.helm.release.v1.control.v6\" not found) and failed rollback: release: not found","stacktrace": [..] }

After this first message, the later error messages are about incorrectly trying to adopt existing resources (of the other CRD).
What happens is that Doge has now corrupted the secret which was about Lolcat. As a result, the Helm Operator of Lolcat now has wrong metadata and starts failing with the error below. The problem is that meta.helm.sh/release-name is now about Doge even though this was Lolcat's Helm metadata.

{"level":"error","ts":1594220624.2125375,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"lolcat-controller","request":"dev/control","error":"failed to get candidate release: rendered manifests contain a resource that already exists. Unable to continue with update: Deployment \"a-deployment\" in namespace \"dev\" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key \"meta.helm.sh/release-name\": must be set to \"control\"; annotation validation error: missing key \"meta.helm.sh/release-namespace\": must be set to \"dev\""

Environment

  • operator-sdk version: v0.18.2

  • Kubernetes cluster kind: Minikube

  • Are you writing your operator in ansible, helm, or go?
    Helm

Possible Solution

Maybefix: Include the name of the Custom Resource Definition in the Helm Secret's name?

Remedy: Provide a sensible error message.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.language/helmIssue is related to a Helm operator projectlifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions