Setup
Prerequisites
- Prometheus Operator
or kube-prometheus-stack
installed in the cluster (provides the
ServiceMonitorandPrometheusRuleCRDs) - Grafana with the sidecar enabled for dashboards and datasources
- The Grafana datasource sidecar must watch Secrets (or ConfigMaps) labeled with
grafana_datasource
The built-in Grafana resources are designed for the common Kubernetes pattern
used by kube-prometheus-stack and the Grafana Helm chart: Grafana runs in the
cluster and discovers dashboards and datasources via labeled ConfigMaps and
Secrets. If you use external Grafana, Grafana Cloud, or Grafana Operator, keep
the Prometheus resources enabled but plan to manage the dashboard and datasource
through your existing Grafana workflow instead of relying on sidecar discovery.
Enabling Monitoring
Set monitoring.enabled to true in your Helm values:
monitoring:
enabled: true
All sub-resources (ServiceMonitors, PrometheusRules, Grafana dashboard, Grafana datasource) are enabled by default once the top-level flag is set. You can selectively disable any of them:
monitoring:
enabled: true
serviceMonitors:
enabled: true # ServiceMonitors for Prometheus scraping
prometheusRules:
enabled: true # Built-in alert rules
grafanaDashboard:
enabled: true # Grafana dashboard ConfigMap
grafanaDatasource:
enabled: true # Grafana PostgreSQL datasource Secret
If you only use Kargo or Argo CD, disable alerts for the one you are not using to avoid false-positive alerts:
monitoring:
enabled: true
alerts:
kargo:
enabled: false # disable if not using Kargo
If your Grafana deployment does not use sidecar discovery, disable the chart-managed Grafana resources and import or provision them separately:
monitoring:
enabled: true
grafanaDashboard:
enabled: false
grafanaDatasource:
enabled: false
For the full list of monitoring.* parameters with defaults and descriptions,
see the
Monitoring Parameters
section of the Helm values reference. That page is auto-generated from the
chart's
values.yaml,
which also documents every option inline.
Prometheus Selector Labels
Many Prometheus Operator installations use label selectors to filter which
ServiceMonitors, PodMonitors, and PrometheusRules to discover. The chart
defaults additionalLabels to release: kube-prometheus-stack on all three
resource types, which matches the default selector used by
kube-prometheus-stack.
If your kube-prometheus-stack Helm release has a different name, override the label to match:
monitoring:
enabled: true
serviceMonitors:
additionalLabels:
release: my-prometheus # change to match your Helm release name
podMonitors:
additionalLabels:
release: my-prometheus
prometheusRules:
additionalLabels:
release: my-prometheus
If your ServiceMonitors or PodMonitors are being created but Prometheus is not
scraping them, a label mismatch is almost always the cause. Check your
Prometheus custom resource for serviceMonitorSelector, podMonitorSelector,
and ruleSelector to see what labels are required.
Shared Monitoring Namespace
If your Prometheus, Grafana, and Alertmanager run in a dedicated monitoring
namespace, you can place all monitoring resources there instead of the Akuity
Platform release namespace. This keeps everything co-located and avoids
broadening Grafana sidecar permissions:
monitoring:
enabled: true
serviceMonitors:
namespace: monitoring
podMonitors:
namespace: monitoring
prometheusRules:
namespace: monitoring
grafanaDashboard:
namespace: monitoring
grafanaDatasource:
namespace: monitoring
The release: kube-prometheus-stack label is included by default, so no
additionalLabels override is needed unless your Helm release name differs
(see Prometheus Selector Labels).
This is a common approach for
kube-prometheus-stack
installations because it keeps dashboards, rules, and scrape configuration
alongside the monitoring stack. If your Grafana sidecar is scoped to specific
namespaces instead of ALL, it also avoids extra cross-namespace configuration.
Grafana Datasource Provisioning
By default, the chart provisions a Grafana PostgreSQL datasource Secret for the dashboard panels that query the portal database directly. The provisioned datasource:
- Uses
database.readOnlyHostwhen set, otherwisedatabase.host - Uses
database.port,database.dbname,database.user,database.password, anddatabase.sslmode - Is created in
monitoring.grafanaDatasource.namespace, defaulting tomonitoring.grafanaDashboard.namespaceand then the Helm release namespace
If you already manage a Grafana datasource outside the chart, disable the built-in one:
monitoring:
enabled: true
grafanaDatasource:
enabled: false
If you use a non-default schema or need custom datasource options beyond the
chart defaults, manage the datasource separately in Grafana and keep
monitoring.grafanaDatasource.enabled: false.
This is also the recommended approach if you use external Grafana, Grafana Cloud, or Grafana Operator instead of a sidecar-based in-cluster Grafana deployment.
Grafana Dashboard Folder
To organize the dashboard into a specific Grafana folder, set
monitoring.grafanaDashboard.folder. The chart writes a grafana_folder
annotation on the dashboard ConfigMap, which the Grafana sidecar uses to
place the dashboard in the named folder.
This requires your Grafana installation to have
sidecar.dashboards.folderAnnotation set to grafana_folder. The upstream
Grafana Helm chart does not set this by default. If you use
kube-prometheus-stack
or the standalone
Grafana chart,
add the following to your Grafana values:
sidecar:
dashboards:
folderAnnotation: grafana_folder
Without this, the annotation is ignored and the dashboard lands in the General folder.
monitoring:
enabled: true
grafanaDashboard:
folder: "Akuity Platform"
Verifying Your Setup
After enabling monitoring and deploying, verify each component is working.
Monitoring resource names are prefixed with the Helm release name (e.g.,
<release>-platform-controller). The examples below assume the default release
name akuity-platform.
1. Check monitoring resources are created
All chart-managed monitoring resources carry the label
app.kubernetes.io/part-of: akuity-platform, so you can list them in one
command:
kubectl get servicemonitor,podmonitor,prometheusrule,configmap,secret \
-l app.kubernetes.io/part-of=akuity-platform -n <your-namespace>
You should see ServiceMonitors for each enabled platform component (e.g.
<release>-platform-controller, <release>-portal-server), PodMonitors for
repo-server-delegate and repo-server-proxy, a PrometheusRule, a Grafana
dashboard ConfigMap, and a Grafana datasource Secret.
2. Verify Prometheus is scraping targets
Open the Prometheus UI (typically at http://<prometheus-host>:9090) and
navigate to Status > Targets. Look for targets matching the Akuity Platform
ServiceMonitors and PodMonitors. All targets should show a UP state.
ServiceMonitor targets appear as serviceMonitor/<namespace>/<name> and
PodMonitor targets appear as podMonitor/<namespace>/<name>. The PodMonitor
targets (<release>-repo-server-proxy, <release>-repo-server-delegate) may
show zero active targets if no Argo CD instances are using the Repo Server
Delegate feature: this is expected.
New targets may briefly appear as UNKNOWN immediately after the monitoring
resources are created. This is expected until the first scrape completes.
With the default monitoring.serviceMonitors.interval: 60s, allow up to one
minute before treating this as a failure.
If targets are missing, check that your ServiceMonitors and PodMonitors have
the correct additionalLabels to match your Prometheus selectors (see
Prometheus Selector Labels above).
3. Verify alerts are loaded
kubectl get prometheusrule -n <your-namespace>
You should see akuity-platform-rules (or <release>-rules if you used a
custom release name). To verify Prometheus has loaded the rules, navigate to
Status > Rules in the Prometheus UI and search for Akuity.
4. Find the Grafana dashboard
Open Grafana and search for "Akuity Platform" in the dashboard search. If the dashboard is not appearing, verify:
- The Grafana sidecar is enabled and configured to watch ConfigMaps with the
grafana_dashboardlabel - The dashboard ConfigMap is in a namespace the sidecar watches (see Shared Monitoring Namespace for the recommended setup)
If you intentionally disabled monitoring.grafanaDashboard.enabled, import
the bundled dashboard JSON into Grafana using your normal workflow instead.
5. Verify the Grafana datasource exists
Open Grafana and navigate to Connections > Data sources. You should see a
PostgreSQL datasource named <release> Portal DB unless you overrode
monitoring.grafanaDatasource.datasourceName.
If it is missing, verify:
- The Grafana datasource sidecar is enabled
- The sidecar watches Secrets labeled with
grafana_datasource - The datasource Secret is in a namespace the sidecar watches
If you intentionally disabled monitoring.grafanaDatasource.enabled, provision
an equivalent PostgreSQL datasource in Grafana yourself and ensure its UID
matches the one referenced by the dashboard, or update the dashboard to point
at your datasource.
Scraped Components
ServiceMonitors
ServiceMonitors are created for each platform component that exposes a metrics
endpoint. Components gated by an enabled flag only get a ServiceMonitor when
that component is also enabled.
| Component | Metrics Port | Condition |
|---|---|---|
| platform-controller | 9500 | Always |
| portal-server | 9501 | Always |
| notification-controller | 9505 | Only when notificationController.enabled: true |
| addon-controller | 9506 | Only when addonController.enabled: true |
PodMonitors
PodMonitors scrape metrics from pods in Argo CD instance namespaces
(argocd-*). Unlike ServiceMonitors, they use namespaceSelector.any: true
to discover pods across all namespaces.
| PodMonitor | Selector | Port | Notes |
|---|---|---|---|
| repo-server-delegate | akuity.io/repo-server-delegate label exists | metrics | Only produces targets when instances use the Repo Server Delegate feature |
| repo-server-proxy | akuity.io/repo-server-proxy: "true" | akuity-metrics | Drops the high-cardinality repo_server_proxy_method_duration_seconds_bucket metric via metricRelabeling to control storage costs |
Both PodMonitors are enabled by default when monitoring.podMonitors.enabled
is true. It is safe to leave them enabled even when no instances use the
Repo Server Delegate feature: the PodMonitors simply match zero pods.
Grafana Dashboard
The bundled Grafana dashboard provides visibility into:
- Argo CD Instances: health distribution, reconciliation status, instance counts, tables of unhealthy/unreconciled instances
- Argo CD Clusters: connection status, reconciliation, health breakdown
- Kargo Instances: health, reconciliation, instance counts
- Kargo Agents: connection status, reconciliation, health breakdown
- Control Plane Operations: controller workqueue depth and duration, OOM-killed containers, database connection pool stats, persistent volume usage, CPU throttling
- Argo CD Repo Server Delegate (Optional): reverse-proxy latency, request rate, and pending requests for instances using the Repo Server Delegate feature
The dashboard includes configurable template variables:
- DS_PROMETHEUS: Prometheus datasource for metrics panels (health gauges, time series, alert-derived stats)
- DS_PORTAL_DB: PostgreSQL datasource for table panels that query the
portal database directly (instance lists, cluster details, org breakdowns).
The chart provisions this datasource by default through
monitoring.grafanaDatasource.*. If you disable that datasource or manage your own, those panels will fail untilDS_PORTAL_DBresolves to a working PostgreSQL datasource. - namespace: filters metrics to the selected Kubernetes namespace
- ThrottlingRatio: threshold used by the CPU-throttled containers panel
The Argo CD Repo Server Delegate (Optional) row may be empty. It only shows data when both of the following are true:
- the Argo CD instance is configured to use
repoServerDelegatein eithercontrolPlaneormanagedClustermode - instance-level Prometheus metrics are enabled on the platform controller
(for self-hosted installs this is typically done by setting
platformController.env.ENABLE_INSTANCE_PROMETHEUS_MONITORING: "true")
The bundled chart includes PodMonitors (monitoring.podMonitors.enabled)
that scrape repo-server-delegate and repo-server-proxy metrics across
instance namespaces (argocd-*). If an instance uses the default
"all managed clusters" manifest generation layout, this row will remain
empty because there is no delegated repo server reverse-proxy traffic to
display.