Metric Alerts
Metric alerts can be configured to automatically create and update topics with Diffusion™ Cloud metrics.
Metric Alerts
Metric alerts allow users to react to changes in system metrics, such as memory usage, by defining conditions under which notifications are triggered.
Metric alerts are defined using a domain-specific language (DSL) to specify the conditions for triggering alerts. Users can create alerts based on various metrics available in Diffusion, such as JVM memory usage, and define thresholds for these metrics. When a metric crosses the specified threshold, the alert is triggered, resulting in the creation or update of a corresponding Diffusion topic. This topic will contain a JSON representation of the metric data that caused the alert to trigger.
If a topic already exists at the path in the alert specification, the alert will not publish to it unless it was also created by a metric alert. If the existing topic was not created by a metric alert the alert will not be allowed to publish to it, and a warning will be logged. Conversely if another topic (including a reference topic) is created at a path where a metric alert topic already exists, the alert topic will be replaced and a warning will be logged.
For example for the following Metric Alert specifications you will see the corresponding topic values.
-
select java_version into topic foo
{ "metric_name": "java_version", "alert_name": "alertName", "server": "SERVER-0", "timestamp": null, "unit": "", "dimensions": { "value": "1.8.0_412" }, "alert_type": "sample_matches", "metric_type": "INFO", "value": 1 }
-
select diffusion_sessions_outbound_bytes_total into topic foo
{ "metric_name": "diffusion_sessions_outbound_bytes_total", "alert_name": "alertName", "server": "SERVER-0", "timestamp": null, "unit": "bytes", "dimensions": {}, "alert_type": "sample_matches", "metric_type": "COUNTER", "value": 2928 }
-
select diffusion_connector_total_number_of_connections_total into topic foo
{ "metric_name": "diffusion_connector_total_number_of_connections_total", "alert_name": "alertName", "server": "SERVER-0", "timestamp": null, "unit": "", "dimensions": { "name": "DEFAULT-CONNECTOR" }, "alert_type": "sample_matches", "metric_type": "COUNTER", "value": 16 }
-
select os_system_cpu_load into topic foo
{ "metric_name": "os_system_cpu_load", "alert_name": "alertName", "server": "SERVER-0", "timestamp": null, "unit": "", "dimensions": {}, "alert_type": "sample_matches", "metric_type": "GAUGE", "value": 0.032598 }
Users interact with metric alerts through API methods in the Metrics feature, which allow adding, removing, and listing alerts. Listing alerts requires the VIEW_SERVER global permission, while adding or removing alerts requires the CONTROL_SERVER global permission.
Topic creation is done using the principal of the session that created the alert, and so inherits the permissions of that session.
Metric Alert Specifications
The DSL for metric alerts specifies the metric from which the alert is to be created and the topic to which the alert is published. Additionally, it may contain conditionals to control when the alert is published, and control the specification of the topic and which server the alert will appear for. For available metrics see Metrics.
Here is a typical example of an alert specification:
select os_system_load_average into topic metrics/</server>/os_system_load_average where value > 5
This statement creates an alert on the 'os_system_load_average' metric, triggering when its value exceeds 5. The path of the notification topic depends on the name of the server which triggered the alert; a value in a topic path surrounded by angle brackets indicates a JSON pointer to a value in the alert data. If the server name is 'diffusion_0', the topic path will be 'metrics/diffusion_0/os_system_load_average'.
The full syntax of the specification language is as follows:
SELECT metric_name [ FROM_SERVER server_name ] INTO_TOPIC topic_path [ WITH_PROPERTIES { property_name: property_value [, ...] } ] [ WHERE condition ] [ DISABLE_UNTIL condition ]
-
metric_name
A string representing the name of the metric to be monitored.
-
FROM_SERVER server_name (optional)
A string identifying the specific server from which to monitor the metric. If omitted, defaults to monitoring on all servers in the cluster.
-
INTO_TOPIC topic_path
The topic path where alert notifications are sent. The path may include values from the event JSON using JSON pointers enclosed in angle brackets.
-
WITH_PROPERTIES { property_name: property_value [, ...] } (optional)
A set of key-value pairs defining additional properties for the topic. Specified using standard Diffusion topic property names. See also Properties of topics.
For example you may want to use REMOVAL properties to automatically remove the alert topic if it has not been updated for some time:
select diffusion_topics_value_count into topic foo with properties {REMOVAL: 'when no updates for 10m'} where value > 500
Or use Time Series properties to keep a history of alerts:
select diffusion_topics_value_count into topic foo with properties {TIME_SERIES_RETAINED_RANGE: 'limit 100'}
-
WHERE condition (optional)
A logical expression that defines the triggering condition for the alert. Dimensional data may be compared using the dimensions object, for example where dimensions = {name: 'foo'}.
See table below for supported operators.
-
DISABLE_UNTIL condition (optional)
A logical expression specifying conditions under which the alert is temporarily disabled after being triggered. The alert is re-enabled once these conditions are met.
See table below for supported operators.
Conditions support the following operators and comparators:
logical operators | AND, OR, NOT (case insensitive) |
comparisons | >, <, >=, <= |
equality | =, !=, <> |
These can be combined to create complex conditions. For example:
select diffusion_topics_value_count into topic foo where value > 100
select diffusion_connector_total_number_of_connections_total into topic foo where value > 100 and dimensions = {'name': 'DEFAULT-CONNECTOR'}
select diffusion_topics_value_count into topic foo where dimensions = {'name': 'DEFAULT-CONNECTOR'} or (value > 100 and not value = 200)
Using the feature
final Metrics metrics = session.feature(Metrics.class); metrics.setMetricAlert("myAlert", "select os_system_cpu_load into topic foo"); final List<Metrics.MetricAlert> alerts = metrics.listMetricAlerts().join(); metrics.removeMetricAlert("myAlert");