Skip to content

Sveltos - Kubernetes Add-on Controller | Manage Kubernetes Add-ons with Ease

ClusterProfiles

ClusterProfile is the Custom Resource Definition (CRD) used to instruct Sveltos which add-ons to deploy on a set of clusters.

The ClusterProfile is a cluster-wide resource. It can only match clusters and reference resources to the cluster as a whole.

Pause Annotation

Pausing a ClusterProfile with the profile.projectsveltos.io/paused annotation prevents Sveltos from performing any reconciliation. This effectively freezes the ClusterProfile in its current state, ensuring that no changes are applied to the clusters it manages.

Spec.ClusterSelector

clusterSelector field is used to specify which managed clusters should receive the add-ons and applications defined in the configuration.

This field employs a Kubernetes label selector, allowing us to target clusters based on specific labels.

clusterSelector:
    matchLabels:
      env: prod

By leveraging matchExpressions, we can create more complex and flexible cluster selection criteria.

clusterSelector:
  matchExpressions:
  - {key: env, operator: In, values: [staging, production]}

Spec.HelmCharts

helmCharts field consists of a list of helm charts to be deployed to the clusters matching clusterSelector;

Example - spec.helmCharts

helmCharts:
- repositoryURL:    https://kyverno.github.io/kyverno/
  repositoryName:   kyverno
  chartName:        kyverno/kyverno
  chartVersion:     v3.3.3
  releaseName:      kyverno-latest
  releaseNamespace: kyverno
  helmChartAction:  Install

Helm chart values can be dynamically retrieved from ConfigMaps or Secrets for flexible configuration. Customize Helm behavior with various options, and deploy charts from private container registries. For a complete list of features, refer to the Helm chart section.

Spec.PolicyRefs

policyRefs field references a list of ConfigMaps/Secrets, each containing Kubernetes resources to be deployed in the clusters matching clusterSelector.

This field is a slice of PolicyRef structs. Each PolictRef has the following fields:

Field Type Description Optional Default
Kind string The kind of the referenced resource. Supported kinds: Secret and ConfigMap. No
Namespace string The namespace of the referenced resource. If empty, the namespace will be set to the matching cluster's namespace. Yes
Name string The name of the referenced resource. Must be at least one character long. No
DeploymentType string Indicates whether the resource should be deployed to the management cluster (Local) or the managed cluster (Remote). Yes Remote

Example - spec.policyRefs

policyRefs:
- kind: Secret
  name: my-secret-1
  namespace: my-namespace-1
  deploymentType: Local
- kind: ConfigMap
  name: my-configmap-1
  namespace: my-namespace-1
  deploymentType: Remote

Spec.KustomizationRefs

kustomizationRefs field is a list of sources containing kustomization files. Resources will be deployed in the clusters matching the clusterSelector specified.

This field is a slice of KustomizationRef structs. Each KustomizationRef has the following fields:

Field Type Description Optional Default
Kind string The kind of the referenced resource. Supported kinds: flux GitRepository, OCIRepository, Bucket (resources that store Kustomization manifests), and ConfigMap, Secret (resources that contain Kustomization manifests or overlays). No
Namespace string The namespace of the referenced resource. If empty, the namespace will be set to the cluster's namespace. Yes
Name string The name of the referenced resource. Must be at least one character long. No
Path string The path to the directory containing the kustomization.yaml file, or the set of plain YAMLs for which a kustomization.yaml should be generated. Defaults to the root path of the SourceRef. Yes None
TargetNamespace string The target namespace for the Kustomization deployment. Can be used to override the namespace specified in the kustomization.yaml file. Yes
DeploymentType string Indicates whether the Kustomization deployment should be deployed to the management cluster (Local) or the managed cluster (Remote). Yes Remote

For a complete list of features, refer to the Kustomize section.

Spec.SyncMode

This field can be set to:

  • OneTime
  • Continuous
  • ContinuousWithDriftDetection
  • DryRun

OneTime: Once we deploy a ClusterProfile with a OneTime configuration, Sveltos will check the matching clusters using the clusterSelector. Any matching cluster will have the resources specified in the ClusterProfile deployed. However, if we make changes to the ClusterProfile later on, those changes will not be automatically deployed to already-matching clusters.

Continuous: Ideal option if we are looking for real-time deployment and updates. Any changes made to the ClusterProfile will be immediately reconciled into matching clusters. This means that we can add new features, update existing ones, and remove them as necessary, all without lifting a finger. Sveltos will deploy, update, or remove resources in matching clusters as needed, making our life as a Kubernetes admin a breeze.

ContinuousWithDriftDetection: Instructs Sveltos to monitor the state of managed clusters and detect a configuration drift for any of the resources deployed because of that ClusterProfile. When Sveltos detects a configuration drift, it automatically re-syncs the cluster state back to the state described in the management cluster. To know more about configuration drift detection, refer to this section.

DryRun: If we do not want to risk deploying changes that could cause any unwanted side effects, the DryRun option is ideal. By deploying a ClusterProfile with this configuration, we can launch a simulation of all the operations that would normally be executed in a live run. The best part? No actual changes will be made to the matching clusters during this dry run workflow, so we can rest easy knowing that there will not be any surprises. To know more about dry run, refer to this section.

Spec.StopMatchingBehavior

The stopMatchingBehavior field specifies the behavior when a cluster no longer matches a ClusterProfile. By default, all Kubernetes resources and Helm charts deployed to the cluster will be removed. However, if StopMatchingBehavior is set to LeavePolicies, any policies deployed by the ClusterProfile will remain in the cluster.

Example - spec.stopMatchingBehavior

---
apiVersion: config.projectsveltos.io/v1beta1
kind: ClusterProfile
metadata:
  name: kyverno
spec:
  stopMatchingBehavior: WithdrawPolicies
  clusterSelector:
    matchLabels:
      env: prod
  helmCharts:
  - repositoryURL:    https://kyverno.github.io/kyverno/
    repositoryName:   kyverno
    chartName:        kyverno/kyverno
    chartVersion:     v3.3.3
    releaseName:      kyverno-latest
    releaseNamespace: kyverno
    helmChartAction:  Install

When a cluster matches the ClusterProfile, Kyverno Helm chart will be deployed in such a cluster. If the cluster's labels are subsequently modified and cluster no longer matches the ClusterProfile, the Kyverno Helm chart will be uninstalled. However, if the stopMatchingBehavior property is set to LeavePolicies, Sveltos will retain the Kyverno Helm chart in the cluster.

Spec.Reloader

The reloader property determines whether rolling upgrades should be triggered for Deployment, StatefulSet, or DaemonSet instances managed by Sveltos and associated with this ClusterProfile when changes are made to mounted ConfigMaps or Secrets. When set to true, Sveltos automatically initiates rolling upgrades for affected Deployment, StatefulSet, or DaemonSet instances whenever any mounted ConfigMap or Secret is modified. This ensures that the latest configuration updates are applied to the respective workloads.

Refer to the dedicated section for more information.

Spec.MaxUpdate

A ClusterProfile might match more than one cluster. When a change is maded to a ClusterProfile, by default all matching clusters are update concurrently. The maxUpdate field specifies the maximum number of Clusters that can be updated concurrently during an update operation triggered by changes to the ClusterProfile's add-ons or applications. The specified value can be an absolute number (e.g., 5) or a percentage of the desired cluster count (e.g., 10%). The default value is 100%, allowing all matching Clusters to be updated simultaneously.

For instance, if set to 30%, when modifications are made to the ClusterProfile's add-ons or applications, only 30% of matching Clusters will be updated concurrently. Updates to the remaining matching Clusters will only commence upon successful completion of updates in the initially targeted Clusters. This approach ensures a controlled and manageable update process, minimizing potential disruptions to the overall cluster environment. Please refer to this section for more information.

Spec.ValidateHealths

The validateHealths property defines a set of checks that Sveltos executes against the managed cluster to assess the health of add-ons and applications in the ClusterProfile. Sveltos holds the ClusterProfile in a non-provisioned state until all checks pass.

Each check can perform the following:

  • Inspect Kubernetes resources: Fetch objects by Group/Version/Kind and evaluate them with a Lua script or CEL rules.
  • Query a Prometheus-compatible metrics endpoint: Run PromQL instant queries and evaluate the results with a Lua script.
  • Combine both: Inspect a resource and metric values in the same script.

Refer to the rolling update strategy for how validateHealths integrates with maxUpdate to roll changes across clusters safely.

Resource-based checks

Set group, version, and kind to identify the resource type. Sveltos fetches all matching objects and runs the script once per object, passing it as obj.

Example - spec.validateHealths (Lua)

validateHealths:
- name: deployment-health
  featureID: Helm
  group: "apps"
  version: "v1"
  kind: "Deployment"
  namespace: kyverno
  script: |
    function evaluate(obj)
      if obj.status ~= nil and obj.status.availableReplicas ~= nil
          and obj.status.availableReplicas == obj.spec.replicas then
        return {healthy=true, message=""}
      end
      return {healthy=false, message="available replicas not matching requested replicas"}
    end

CEL is also supported:

Example - spec.validateHealths (CEL)

validateHealths:
- name: deployment-health
  featureID: Helm
  group: "apps"
  version: "v1"
  kind: "Deployment"
  namespace: kyverno
  evaluateCEL:
  - name: replicas_match
    rule: resource.status.availableReplicas == resource.spec.replicas

Metric-based checks

Set metricSource to identify the Prometheus endpoint and metricQueries to list the PromQL instant queries to run. Each query result is a scalar float available in the script as metrics["<name>"]. When group/version/kind are omitted the script is called once with no resource argument.

Connectivity: in push mode the addon-controller running on the management cluster makes a direct HTTP request to the URL, so the metric service must be reachable from the management cluster (e.g. via a NodePort or LoadBalancer). In pull mode the sveltos-applier agent running inside the managed cluster makes the request, so an in-cluster DNS name such as http://prometheus-server.monitoring.svc:9090 works without any external exposure.

Credentials: if the endpoint requires authentication, create a Secret with either a token key (bearer token) or username and password keys (basic auth). In push mode the Secret must exist in the management cluster in the same namespace as the ClusterProfile; in pull mode it must exist on the managed cluster.

Example - error rate check, no credentials

validateHealths:
- name: error-rate-low
  featureID: Helm
  metricSource:
    url: http://prometheus-server.monitoring.svc:9090
  metricQueries:
  - name: errorRate
    query: >-
      sum(rate(http_requests_errors_total{namespace="my-app"}[5m]))
      /
      sum(rate(http_requests_total{namespace="my-app"}[5m]))
  script: |
    function evaluate()
      if metrics["errorRate"] > 0.05 then
        return {healthy=false, message="error rate above 5%: " .. metrics["errorRate"]}
      end
      return {healthy=true, message=""}
    end

When the Prometheus endpoint requires a bearer token, first create the Secret:

$ kubectl create secret generic prometheus-token \
  --from-literal=token=<bearer-token> \
  -n <secret-namespace>

Then reference it via metricSource.secretRef:

Example - error rate check, bearer token

validateHealths:
- name: error-rate-low
  featureID: Helm
  metricSource:
    url: http://prometheus-server.monitoring.svc:9090
    secretRef:
      namespace: monitoring
      name: prometheus-token
  metricQueries:
  - name: errorRate
    query: >-
      sum(rate(http_requests_errors_total{namespace="my-app"}[5m]))
      /
      sum(rate(http_requests_total{namespace="my-app"}[5m]))
  script: |
    function evaluate()
      if metrics["errorRate"] > 0.05 then
        return {healthy=false, message="error rate above 5%: " .. metrics["errorRate"]}
      end
      return {healthy=true, message=""}
    end

For basic auth create the Secret with username and password keys instead:

$ kubectl create secret generic prometheus-basic-auth \
  --from-literal=username=admin \
  --from-literal=password=<password> \
  -n <secret-namespace>

Combined resource and metric check

A single check can inspect a Kubernetes resource and evaluate metric values in the same script. Both obj and the metrics map are in scope.

Example - deployment ready and webhook latency within bound

validateHealths:
- name: kyverno-healthy
  featureID: Helm
  group: "apps"
  version: "v1"
  kind: "Deployment"
  namespace: kyverno
  metricSource:
    url: http://prometheus-server.monitoring.svc:9090
  metricQueries:
  - name: webhookP99Ms
    query: >-
      histogram_quantile(0.99,
        rate(kyverno_admission_review_duration_seconds_bucket[5m])
      ) * 1000
  script: |
    function evaluate(obj)
      if obj.status == nil or obj.status.availableReplicas == nil
          or obj.status.availableReplicas == 0 then
        return {healthy=false, message="no available replicas"}
      end
      if metrics["webhookP99Ms"] > 500 then
        return {healthy=false,
                message="webhook p99 above 500 ms: " .. metrics["webhookP99Ms"]}
      end
      return {healthy=true, message=""}
    end

Spec.TemplateResourceRefs

The templateResourceRefs property specifies a collection of resources to be gathered from the management cluster. The values extracted from these resources will be utilized to instantiate templates embedded within referenced PolicyRefs and Helm charts. Refer to template section for more info and examples.

Spec.DependsOn

The dependsOn property specifies a list of other ClusterProfiles that this instance relies on. In any managed cluster that matches to this ClusterProfile, the add-ons and applications defined in this instance will only be deployed after all add-ons and applications in the designated dependency ClusterProfiles have been successfully deployed.

For example, clusterprofile-a can depend on another clusterprofile-b. This implies that any Helm charts or raw YAML files associated with ClusterProfile A will not be deployed until all add-ons and applications specified in ClusterProfile B have been successfully provisioned.

Example - spec.dependsOn

---
apiVersion: config.projectsveltos.io/v1beta1
kind: ClusterProfile
metadata:
  name: clusterprofile-a
spec:
  dependsOn:
  - clusterprofile_b

Sveltos automatically resolves and deploys the prerequisite profiles specified in the DependsOn field. Sveltos will analyze the dependency graph, identify the required prerequisite profiles, and ensure they are deployed to the same clusters.

Spec.ContinueOnError

ContinueOnError configures Sveltos' error handling. When true, errors are logged, but deployment continues. When false (default), Sveltos stops at the first error and retries the failing resource. For instance, if deploying three Helm charts, a failure during the second chart's deployment will halt the process, and Sveltos will retry the second chart. Only if ContinueOnError is true will Sveltos proceed to deploy the third chart before retrying the second chart.

Spec.ContinueOnConflict

ContinueOnConflict configures Sveltos' conflict resolution behavior. When true, Sveltos logs the conflicts but continues deploying the remaining resources. When false (default), Sveltos halts deployment at the first detected conflict. This can happen when another profile has already deployed the same resource.

Spec.PreDeployChecks

The preDeployChecks field defines a set of checks that Sveltos must pass before deploying resources for a given feature. If any check fails, deployment is held and retried — no resources are applied until all checks pass.

Common use cases include:

  1. Readiness Gates: Ensuring a required StorageClass, CustomResourceDefinition, or operator is present before deploying an application that depends on it.
  2. Capacity Validation: Verifying that sufficient quota or node capacity exists in the target cluster before rolling out a workload.

Spec.PreDeleteChecks

The preDeleteChecks field defines a set of safety checks that Sveltos must perform before it begins removing resources from a managed cluster. If any of these checks fail (i.e., the Lua script returns health = false), Sveltos will halt the deletion process and retry later.

This is particularly useful for ensuring:

  1. Data Integrity: Verifying that a backup job has completed before deleting a database.
  2. Dependency Logic: Ensuring a "consumer" application is removed before the "provider" service it relies on.

Spec.PostDeleteChecks

The postDeleteChecks field defines checks that Sveltos executes after the deletion commands have been issued. These checks verify that the cluster has reached a truly "clean" state.

Common use cases include:

  1. Orphaned Resource Detection: Verifying that cloud-provider resources (like LoadBalancers or PVCs) were actually released.
  2. Namespace Cleanup: Ensuring a namespace is fully terminated and not stuck in a "Terminating" state due to finalizers.

Spec.PatchesFrom

The patchesFrom field allows you to decouple patch definitions from the ClusterProfile itself. By referencing ConfigMaps or Secrets, you can inject environment-specific configurations into your resources at runtime.

This is particularly effective for:

  1. Security: Storing sensitive patches in Secrets rather than plain-text ClusterProfiles.
  2. Scalability: Using a single ClusterProfile for hundreds of clusters while tailoring replica counts, node selectors, or resource limits for each one.
  3. Dynamic Customization: Leveraging Go templating in the name and namespace fields to automatically fetch patches based on the target cluster's metadata.