Skip to main content

Setup

Prerequisites

  • Prometheus Operator or kube-prometheus-stack installed in the cluster (provides the ServiceMonitor and PrometheusRule CRDs)
  • Grafana with the sidecar enabled for dashboards and datasources
  • The Grafana datasource sidecar must watch Secrets (or ConfigMaps) labeled with grafana_datasource

The built-in Grafana resources are designed for the common Kubernetes pattern used by kube-prometheus-stack and the Grafana Helm chart: Grafana runs in the cluster and discovers dashboards and datasources via labeled ConfigMaps and Secrets. If you use external Grafana, Grafana Cloud, or Grafana Operator, keep the Prometheus resources enabled but plan to manage the dashboard and datasource through your existing Grafana workflow instead of relying on sidecar discovery.

Enabling Monitoring

Set monitoring.enabled to true in your Helm values:

monitoring:
enabled: true

All sub-resources (ServiceMonitors, PrometheusRules, Grafana dashboard, Grafana datasource) are enabled by default once the top-level flag is set. You can selectively disable any of them:

monitoring:
enabled: true
serviceMonitors:
enabled: true # ServiceMonitors for Prometheus scraping
prometheusRules:
enabled: true # Built-in alert rules
grafanaDashboard:
enabled: true # Grafana dashboard ConfigMap
grafanaDatasource:
enabled: true # Grafana PostgreSQL datasource Secret

If you only use Kargo or Argo CD, disable alerts for the one you are not using to avoid false-positive alerts:

monitoring:
enabled: true
alerts:
kargo:
enabled: false # disable if not using Kargo

If your Grafana deployment does not use sidecar discovery, disable the chart-managed Grafana resources and import or provision them separately:

monitoring:
enabled: true
grafanaDashboard:
enabled: false
grafanaDatasource:
enabled: false

For the full list of monitoring.* parameters with defaults and descriptions, see the Monitoring Parameters section of the Helm values reference. That page is auto-generated from the chart's values.yaml, which also documents every option inline.

Prometheus Selector Labels

Many Prometheus Operator installations use label selectors to filter which ServiceMonitors, PodMonitors, and PrometheusRules to discover. The chart defaults additionalLabels to release: kube-prometheus-stack on all three resource types, which matches the default selector used by kube-prometheus-stack.

If your kube-prometheus-stack Helm release has a different name, override the label to match:

monitoring:
enabled: true
serviceMonitors:
additionalLabels:
release: my-prometheus # change to match your Helm release name
podMonitors:
additionalLabels:
release: my-prometheus
prometheusRules:
additionalLabels:
release: my-prometheus
tip

If your ServiceMonitors or PodMonitors are being created but Prometheus is not scraping them, a label mismatch is almost always the cause. Check your Prometheus custom resource for serviceMonitorSelector, podMonitorSelector, and ruleSelector to see what labels are required.

Shared Monitoring Namespace

If your Prometheus, Grafana, and Alertmanager run in a dedicated monitoring namespace, you can place all monitoring resources there instead of the Akuity Platform release namespace. This keeps everything co-located and avoids broadening Grafana sidecar permissions:

monitoring:
enabled: true
serviceMonitors:
namespace: monitoring
podMonitors:
namespace: monitoring
prometheusRules:
namespace: monitoring
grafanaDashboard:
namespace: monitoring
grafanaDatasource:
namespace: monitoring

The release: kube-prometheus-stack label is included by default, so no additionalLabels override is needed unless your Helm release name differs (see Prometheus Selector Labels).

tip

This is a common approach for kube-prometheus-stack installations because it keeps dashboards, rules, and scrape configuration alongside the monitoring stack. If your Grafana sidecar is scoped to specific namespaces instead of ALL, it also avoids extra cross-namespace configuration.

Grafana Datasource Provisioning

By default, the chart provisions a Grafana PostgreSQL datasource Secret for the dashboard panels that query the portal database directly. The provisioned datasource:

  • Uses database.readOnlyHost when set, otherwise database.host
  • Uses database.port, database.dbname, database.user, database.password, and database.sslmode
  • Is created in monitoring.grafanaDatasource.namespace, defaulting to monitoring.grafanaDashboard.namespace and then the Helm release namespace

If you already manage a Grafana datasource outside the chart, disable the built-in one:

monitoring:
enabled: true
grafanaDatasource:
enabled: false

If you use a non-default schema or need custom datasource options beyond the chart defaults, manage the datasource separately in Grafana and keep monitoring.grafanaDatasource.enabled: false.

This is also the recommended approach if you use external Grafana, Grafana Cloud, or Grafana Operator instead of a sidecar-based in-cluster Grafana deployment.

Grafana Dashboard Folder

To organize the dashboard into a specific Grafana folder, set monitoring.grafanaDashboard.folder. The chart writes a grafana_folder annotation on the dashboard ConfigMap, which the Grafana sidecar uses to place the dashboard in the named folder.

note

This requires your Grafana installation to have sidecar.dashboards.folderAnnotation set to grafana_folder. The upstream Grafana Helm chart does not set this by default. If you use kube-prometheus-stack or the standalone Grafana chart, add the following to your Grafana values:

sidecar:
dashboards:
folderAnnotation: grafana_folder

Without this, the annotation is ignored and the dashboard lands in the General folder.

monitoring:
enabled: true
grafanaDashboard:
folder: "Akuity Platform"

Verifying Your Setup

After enabling monitoring and deploying, verify each component is working. Monitoring resource names are prefixed with the Helm release name (e.g., <release>-platform-controller). The examples below assume the default release name akuity-platform.

1. Check monitoring resources are created

All chart-managed monitoring resources carry the label app.kubernetes.io/part-of: akuity-platform, so you can list them in one command:

kubectl get servicemonitor,podmonitor,prometheusrule,configmap,secret \
-l app.kubernetes.io/part-of=akuity-platform -n <your-namespace>

You should see ServiceMonitors for each enabled platform component (e.g. <release>-platform-controller, <release>-portal-server), PodMonitors for repo-server-delegate and repo-server-proxy, a PrometheusRule, a Grafana dashboard ConfigMap, and a Grafana datasource Secret.

2. Verify Prometheus is scraping targets

Open the Prometheus UI (typically at http://<prometheus-host>:9090) and navigate to Status > Targets. Look for targets matching the Akuity Platform ServiceMonitors and PodMonitors. All targets should show a UP state.

ServiceMonitor targets appear as serviceMonitor/<namespace>/<name> and PodMonitor targets appear as podMonitor/<namespace>/<name>. The PodMonitor targets (<release>-repo-server-proxy, <release>-repo-server-delegate) may show zero active targets if no Argo CD instances are using the Repo Server Delegate feature: this is expected.

New targets may briefly appear as UNKNOWN immediately after the monitoring resources are created. This is expected until the first scrape completes. With the default monitoring.serviceMonitors.interval: 60s, allow up to one minute before treating this as a failure.

If targets are missing, check that your ServiceMonitors and PodMonitors have the correct additionalLabels to match your Prometheus selectors (see Prometheus Selector Labels above).

3. Verify alerts are loaded

kubectl get prometheusrule -n <your-namespace>

You should see akuity-platform-rules (or <release>-rules if you used a custom release name). To verify Prometheus has loaded the rules, navigate to Status > Rules in the Prometheus UI and search for Akuity.

4. Find the Grafana dashboard

Open Grafana and search for "Akuity Platform" in the dashboard search. If the dashboard is not appearing, verify:

  • The Grafana sidecar is enabled and configured to watch ConfigMaps with the grafana_dashboard label
  • The dashboard ConfigMap is in a namespace the sidecar watches (see Shared Monitoring Namespace for the recommended setup)

If you intentionally disabled monitoring.grafanaDashboard.enabled, import the bundled dashboard JSON into Grafana using your normal workflow instead.

5. Verify the Grafana datasource exists

Open Grafana and navigate to Connections > Data sources. You should see a PostgreSQL datasource named <release> Portal DB unless you overrode monitoring.grafanaDatasource.datasourceName.

If it is missing, verify:

  • The Grafana datasource sidecar is enabled
  • The sidecar watches Secrets labeled with grafana_datasource
  • The datasource Secret is in a namespace the sidecar watches

If you intentionally disabled monitoring.grafanaDatasource.enabled, provision an equivalent PostgreSQL datasource in Grafana yourself and ensure its UID matches the one referenced by the dashboard, or update the dashboard to point at your datasource.

Scraped Components

ServiceMonitors

ServiceMonitors are created for each platform component that exposes a metrics endpoint. Components gated by an enabled flag only get a ServiceMonitor when that component is also enabled.

ComponentMetrics PortCondition
platform-controller9500Always
portal-server9501Always
notification-controller9505Only when notificationController.enabled: true
addon-controller9506Only when addonController.enabled: true

PodMonitors

PodMonitors scrape metrics from pods in Argo CD instance namespaces (argocd-*). Unlike ServiceMonitors, they use namespaceSelector.any: true to discover pods across all namespaces.

PodMonitorSelectorPortNotes
repo-server-delegateakuity.io/repo-server-delegate label existsmetricsOnly produces targets when instances use the Repo Server Delegate feature
repo-server-proxyakuity.io/repo-server-proxy: "true"akuity-metricsDrops the high-cardinality repo_server_proxy_method_duration_seconds_bucket metric via metricRelabeling to control storage costs

Both PodMonitors are enabled by default when monitoring.podMonitors.enabled is true. It is safe to leave them enabled even when no instances use the Repo Server Delegate feature: the PodMonitors simply match zero pods.

Grafana Dashboard

The bundled Grafana dashboard provides visibility into:

  • Argo CD Instances: health distribution, reconciliation status, instance counts, tables of unhealthy/unreconciled instances
  • Argo CD Clusters: connection status, reconciliation, health breakdown
  • Kargo Instances: health, reconciliation, instance counts
  • Kargo Agents: connection status, reconciliation, health breakdown
  • Control Plane Operations: controller workqueue depth and duration, OOM-killed containers, database connection pool stats, persistent volume usage, CPU throttling
  • Argo CD Repo Server Delegate (Optional): reverse-proxy latency, request rate, and pending requests for instances using the Repo Server Delegate feature

The dashboard includes configurable template variables:

  • DS_PROMETHEUS: Prometheus datasource for metrics panels (health gauges, time series, alert-derived stats)
  • DS_PORTAL_DB: PostgreSQL datasource for table panels that query the portal database directly (instance lists, cluster details, org breakdowns). The chart provisions this datasource by default through monitoring.grafanaDatasource.*. If you disable that datasource or manage your own, those panels will fail until DS_PORTAL_DB resolves to a working PostgreSQL datasource.
  • namespace: filters metrics to the selected Kubernetes namespace
  • ThrottlingRatio: threshold used by the CPU-throttled containers panel

The Argo CD Repo Server Delegate (Optional) row may be empty. It only shows data when both of the following are true:

  • the Argo CD instance is configured to use repoServerDelegate in either controlPlane or managedCluster mode
  • instance-level Prometheus metrics are enabled on the platform controller (for self-hosted installs this is typically done by setting platformController.env.ENABLE_INSTANCE_PROMETHEUS_MONITORING: "true")

The bundled chart includes PodMonitors (monitoring.podMonitors.enabled) that scrape repo-server-delegate and repo-server-proxy metrics across instance namespaces (argocd-*). If an instance uses the default "all managed clusters" manifest generation layout, this row will remain empty because there is no delegated repo server reverse-proxy traffic to display.