Grafana High Availability
Grafana can be configured for high availability (HA) so that the application can remain operational if a pod, node, or backing database goes down.
Turning on High Availability for Grafana changes the way that the application operates by turn on unified alerting
mode which has grafana pods talk to each other. Unified alerting attempts remove duplicate alerts as much as possible but some duplicate will make it through to ensure that alerting still occurs if one or more pods errors, maintainers should be proactive and setup alerting tools to deduplicate when Grafana HA is one
Requirements
- Enable Grafana IaC module that creates an RDS
Optional: turn Grafana RDS clustering on for multi-region AZ RDS HA
Grafana IaC
To enable HA turn on the Module grafana
by either turning ont he module directly or setting grafna.inputs.high_availability
to true
To enable HA for just Grafana set high_availability
to true in grafana_inputs
Optional: you can also specify multiple rds_instance here for database HA
locals {
...
grafana_inputs = {
# If enabled, this sets high availability for grafana, the higher level high_availability input has priority
high_availability = true
rds_instances = {
primary = {} # AZ can be set per instance
# secondary = {}
# additional replicas can be added (third = {})
}
}
...
Turns on the module directly, this will also give setting to grafana in cluster
modules {
...
grafana = true
...
Grafana HA can be enables on existing clusters. In addition to giving Grafana a database backend it adds setting to the helm chart that maintainers should be aware of so they are merged in without issue:
grafana:
values:
headlessService: true
autoscaling:
enabled: true
podDisruptionBudget:
apiVersion: "policy/v1"
minAvailable: 1
grafana.ini:
database:
type: postgres
host: "${db_host}:${db_port}"
name: grafana
user: "${db_name}"
unified_alerting:
enabled: true
alerting:
enabled: false