Professional Documents
Culture Documents
8.0
The contents of this document are subject to revision without further notice due to continued progress in methodology, design, and
manufacturing.
Digital Route AB shall have no liability for any errors or damage of any kind resulting from the use of this document.
DigitalRoute® and MediationZone® are registered trademarks of Digital Route AB. All other trade names and marks mentioned
herein are the property of their respective holders.
Table of Contents
1. System Insight Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2. Preparation of System Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Installing System Insight using Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Install System Insight with InfluxDB using Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Install System Insight with Cloudwatch using Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 Access Grafana via Desktop or Web UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Install System Insight Manually . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Configure System Insight Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Configure System Insight without InfluxDB Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3. Configuring System Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1 System Insight Metrics Compaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Setting Retention Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Configuring System Insight with Multiple InfluxDB Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 System Insight Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4. Managing System Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1 Displaying Metrics using System Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1.1 Managing System Insight Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.2 Metrics Naming Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Grafana Dashboards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3 Using System Insight for Batch Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4 REST APIs for System Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5. System Insight Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6. System Insight Backup and Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
System Insight
This document describes how to configure and use System Insight to store and/or visualize system data, and data processed in
MediationZone workflows. This allows you to view how your system is performing, how your data flow is progressing, how capacity
trends look, as well as correlate events for trouble-shooting, or monitor a service chain.
MediationZone contains a rich set of system metrics, in the form of MIM parameters, and System Insight provides a means to
graphically display this data by gathering numeric MIMs from workflows, and JVM metrics.
4
1. System Insight Overview
System Insight provides a standardized means to visualize complex processing and can be used with visualization and analytics
tools which already exist, e g InfluxDB and Grafana.
A probe is a point in MediationZone or a workflow where metrics are created by sampling or aggregating events. Using internal
probes, data flow probes and customized probes, System Insight generates metrics from MediationZone with searchable tags. The
metrics are then sent to the System Insight service, from which they are stored in a time-series database, based on the
open-source InfluxDB project. These metrics can then be displayed in a web-based dashboard builder, based on the open-source
Grafana project, which shows real-time as well as historic metrics. You can customize the dashboards as required, based on the
stored data.
You can use this functionality to visualize MediationZone, the metrics for data flow through MediationZone and the metrics for
assets managed by MediationZone.
System Insight can collect metrics from external probes in order to visualize them, and can also forward metrics to an external
analytics or visualization tool, using MZ workflows. You can create custom metrics using the System Insight forwarding agent in a
workflow. The agent sends metrics samples from the workflow to the System Insight service for visualization. Equally, you can use
the System Insight collection agent to collect data from the system insight service that is running, and send the data to any protocol
supported in MediationZone.
System Insight is an akka-based service which runs on one or several SCs. For further information on Akka clusters, see 1.7 Akka
Cluster in the System Administrator's Guide.
5
2. Preparation of System Insight
For the purpose of demonstrating how you can use System Insight to visualize metrics, the configurations and examples provided
in the documentation show System Insight using InfluxDB 1.2 and Grafana 4.3.1, unless stated otherwise, to store and visualize
data from MediationZone.
You can run predefined scripts to configure MediationZone, and install the embedded versions of InfluxDB and Grafana provided.
Using Cloudwatch as storage is also supported. If you choose to use Cloudwatch, there is also a predefined script provided which
you can use to configure System Insight with Cloudwatch.
Alternatively, you can manually configure MediationZone, and manually install InfluxDB, Cloudwatch or another database, and
Grafana or another visualization tool.
If you use the script provided for installing InfluxDB, there is also the option to access Grafana via the Desktop or Web GUI.
Note!
The minimum requirement of 10 GB is based on a basic setup for system statistics with the default retention policy of
one week for short-term storage, and one year for downsampled long-term storage.
To get System Insight up and running for test purposes, there are a number of scripts and configuration files available:
Note!
The scripts to set up InfluxDB and Grafana are supported on Ubuntu/Debian and Centos/Redhat only.
If you require to configure a System Insight setup using InfluxDB and Grafana offline, copy the grafana<version>.deb and
influxdb.conf files and the scripts from $MZ_HOME/scripts/str-templates/system-insight and save them in the
same directory, then follow the instructions provided below starting from step 2:
6
Note!
Due to the service manager used in Ubuntu 14.04, if you are using this version of Ubuntu, before you run the script to set
up InfluxDB as instructed in step 2, you must rename the systemctl command as follows:
mv systemctl systemctl-bak
mv systemctl-bak systemctl
$ sudo ./si_influxdb_setup.sh
If required, you can modify the default username and password, and you can also change the database name before
running the script. The variables to change in the script are INFLUX_ADMIN_USR, INFLUX_ADMIN_PWD and
INFLUX_SCHEMA.
3. To ensure that the InfluxDB instance works as it should, use the following influx command:
name: databases
name
----
mz
_internal
{"results":[{"statement_id":0,"series":[{"name":"databases","columns":["name"],"values":[["mz"],["_internal"]]}
7
4.
$ sudo ./si_grafana_setup.sh
If required, you can modify the default username and password before running the script. The variables to change in the
script are GRAFANA_USR and GRAFANA_PWD.
5. If you want to add sample dashboards, run the script again with the flag add-dashboards:
Note!
By default, Grafana is installed using http. If you want to use Grafana over https, see the section, Grafana Over
https, in 2.1.3 Access Grafana via Desktop or Web UI.
6. Run the following script on the Platform instance, with the si-topo flag to run the topo commands required to set up
MediationZone with System Insight:
$ ./si_basic_setup.sh si-topo
If you want to access Grafana from the Desktop via Tools, or via the MediationZoneWeb UI, http://<platform
host>:<web interface port>/mz/, modify the property GRAFANA_URL as follows before running the script:
GRAFANA_URL='http://<host name>:3000'
For further information an this method of accessing Grafana from MediationZone, see 2.1.3 Access Grafana via Desktop
or Web UI.
7. You are then prompted to restart the Platform and picos, and startup the services:
8. To set up the filters, run the script with the si-basic-filters flag and your credentials. This step is not obligatory but
provides a setup in which system metrics are produced for InfluxDB:
Steps 6 - 8 are required as system insight service must be up and running to be able to create profiles and filters.
To get System Insight up and running for test purposes, there are a number of scripts and configuration files available:
8
A sample script to set up System Insight on SCs
Note!
The script to set up Cloudwatch is supported on Ubuntu/Debian and Centos/Redhat only.
The steps required to install System Insight with Cloudwatch are as follows:
$ ./si_basic_setup.sh si-topo-cloudwatch
3. You are then prompted to restart the Platform and picos, and startup the services:
4. To set up the filters, run the script with the si-basic-filters flag and your credentials. This step is not obligatory but
provides a setup in which system metrics are produced:
Steps 2 - 4 are required as system insight service must be up and running to be able to create profiles and filters.
1. You can modify the property GRAFANA_URL in the si_setup_script before running the script, as described in step 6
in 2.1.1 Install System Insight with InfluxDB using Scripts.
OR
2. At any time after installing System Insight, you can use the mzsh topo command as shown below:
To be able to access Grafana from the Desktop, the System insight service must be running and the property grafana-url
mentioned above must be set when starting the Desktop.
In Desktop, go to Tools, and select System Insight. You are directed to the Grafana login page.
For System Insight to be visible from the MediationZone Web UI, the property grafana_url mentioned above must be set.
9
Go to the MediationZone Web UI, located at http://<platform host>:<web interface port>/mz/, and select System
Insight from the Dashboard.
If you require extra security, the option to use Grafana over https is available. However, you require a certificate which is not
provided. When you have your certificate in place, take the following steps:
;protocol = http
to the following:
protocol = https
to the following:
10
3.
4. If you choose to use System Insight with the System Insight collection or forwarding agent, see 9.67 System Insight
Agents, or save the data to file. See 2.2.2 Configure System Insight without InfluxDB Instances.
1. Use the mzsh topo command to add the akka service to the custom.conf for services. You must specify a name for
the akka service, e g si. The startup-natures must be si.
See the example below, where the akka service is named si.
2. Use the mzsh topo command to create an SC/SCs on which to run System Insight.
See the example below, where 3 SCs are created with a respective port range:
11
Example - Adding 3 SCs to run System Insight
Note!
If you require high volumes of System Insight metrics (> 10'), add the following parameters to the relevant SC
configuration(s) to ensure that there is enough memory to handle the inflow of metrics. For further information on
how to these jdkarg values to the relevant SC conf fileRefer to 2.4 Managing Pico Configurations.
<jdkarg value="-Xmx2G"/>
<jdkarg value="-server" vendor="sun,hp"/>
<jdkarg value="-Xms2G"/>
<jdkarg value="-XX:MaxMetaspaceSize=196M"/>
<jdkarg value="-XX:NewSize=1G"/>
3. Use the mzsh topo command to add the system insight service to the custom.conf for services as shown below.
You must use the same akka service name that you enter for the akka configuration in step 1, which you must also enter
as the value for the akka-cluster, shown below as <akka service name>.
InfluxDB
If you are using InfluxDB as storage, ensure that you complete the relevant username, password and http url for the
InfluxDB instance that you are using. See the Cloudwatch section below if you are using Cloudwatch as storage.
12
$ mzsh topo set topo://services:custom/obj:system-insight '{
si-instance {
template: "1/standard/basic"
start-after=["akka/si"]
config {
storage-backend=influxdb
akka-cluster: "<akka service name>"
influxdb {
url="<http url>"
user="<influxdb username>"
password="<influxdb password>"
database="<database name>"
}
}
}
}'
Example - Adding the System Insight service to the custom.conf when using influxDB
If you want to be able to access Grafana via Desktop from Tools or via the MediationZone Web UI (http://<platform
host>:<web interface port>/mz/), you can use the mzsh topo command as shown below:
For further information on this method of accessing Grafana from MediationZone, see the section below, Access Grafana
via Desktop or Web UI.
Cloudwatch
If you are using Cloudwatch as storage, ensure that you complete the relevant AWS user access key, AWS user access
secret, AWS region and a namespace prefix. See the table below for information on how to configure the System Insight
service when using Cloudwatch.
13
Property Description
aws-access-key and aws-access-secret AWS credentials. You must encrypt the relevant key and
secret using the command mzsh encryptpassword. If
you do not provide AWS credentials, the IAM policy is
used for authentication. For further information on the
command mzsh encryptpassword, see 2.1.4
encryptpassword in Command Line Tool User's Guide.
region The AWS region. If you do not enter a region, the default
region is used. For further information on how to identify
which region to enter, see x.
namespace-prefix The root namespace to add metrics to. You can use
forward slashes ("/") to add multiple levels.
14
Example - Adding the System Insight service to the custom.conf when using Cloudwatch
4. Use the mzsh topo command to enable System Insight at cell level:
5. Restart the Platform and then start or restart the ECs and SCs:
To be able to access Grafana via Desktop from Tools or via the MediationZone Web UI (http://<platform host>:<web
interface port>/mz/) as mentioned in step 3 above, you can use the following mzsh topo command at any time after
installing System Insight:
The first time you access Grafana via Desktop, the System Insight service must be running and the property grafana_url
mentioned above must be set.
In Desktop, go to Tools, and select System Insight. You are directed to the Grafana login page.
15
Grafana via Web UI
For System Insight to be visible from the MediationZone Web UI, the property grafana_url mentioned above must be set.
Go to the MediationZone Web UI, located at http://<platform host>:<web interface port>/mz/, and select System
Insight from the Dashboard.
1. Use the following mzsh topo set command to disable the storage backend:
For information on how to use the System Insight agents, see 9.67 System Insight Agents.
16
3. Configuring System Insight
After installing System Insight, there are several configuration modifications that you can make depending on how you are using
System Insight, i e with or without InfluxDB and or Grafana, and what data you want to produce using System Insight.
Configuring metrics compaction, retention policies and multiple instances of InfluxDB only applies if you are using System Insight
with InfluxDB.
Note!
The versions of InfluxDB and Grafana included in System Insight are not highly available. If you want to retain the data
produced using System Insight, see 6. System Insight Backup and Maintenance.
Note!
This section only applies if you are using System Insight with InfluxDB for data storage.
As System Insight gathers a large amount of data per second in the form of metrics, using InfluxDB to handle this data, the growing
amount of storage required for this data is addressed by InfluxDB. The InfluxDB solution downsamples the data so that high
precision raw data is kept for only a limited period of time, and lower precision data is kept for a longer period of time. There are
two features which automate the process of downsampling data and expiring old data: Continuous Queries (CQ) and Retention
Policies (RP).
If you have installed InfluxDB using the script provided, the default setup includes three predefined retention policies and one
predefined continuous query.
one_week - this is set as the default retention policy for the database
six_months
one_year
The predefined continuous query is named cq_six_months. This is a generic continuous query, downsampling all metrics as mean
values over a period of 10 minutes from the default retention policy of one week into the retention policy of six months.
You must implement the data compaction solution provided by InfluxDB to store the data produced using System Insight. See the
examples below.
In this scenario, you have installed System Insight using the scripts provided so that you have the default setup of InfluxDB. This
means you have a database named "mz" and a default retention policy named "one_week", and InfluxDB is up and running.
The steps below are required to create a retention policy that stores data for one month and a continuous query that runs every
five minutes and calculates the mean of idle_cpu of the measurements during that time, and to store the new measurement with
the name host.compute in the retention policy created.
1. Create a retention policy for one month. Measurements with this retention policy are stored for a month:
2. In this instance, you are sampling a metric named host.compute with tags={time,host_name} and the values
{idle_cpu , user_time_cpu , idle_proc , sleep_proc , sys_cpu , total_proc , up_time, user_cpu ,
wait_cpu} to InfluxDB with the default retention policy ( one_week ):
17
2.
CREATE CONTINUOUS QUERY "5min_cq" ON "mz" BEGIN SELECT mean("idle_cpu") as "idle_cpu" INTO
"mz”.”one_month”.”host.compute" FROM "host.compute" GROUP BY time(5m) END
This new measurement is stored in the database for a month, after that they are removed.
Depending on how you want to use System Insight, the predefined continuous query cq_six_months might store too much data.
Note!
This section only applies if you are using System Insight with InfluxDB for data storage.
The retention policy that you set in System Insight determines how long the filter data is kept, and it can be set at various levels, at
profile and InfluxDB instance level.
If you do not set a specific retention policy, the default retention polices that exist at InfluxDB instance level are the policies that
apply.
If you have installed System Insight with InfluxDB and Grafana, using the scripts provided, as described in 2.1 Install System
Insight using Scripts (old), the default retention policy is one week. Retention policies of six months or one year are also available
for selection.
If you want to set a specific retention policy at InfluxDB instance level, you add a retention policy via InfluxDB.
If a retention policy is left empty at profile level, but the default is set at InfluxDB instance level, the data retention adheres to the
default policy set on each InfluxDB instance.
At Profile Level
If you set a retention policy for a profile, as described in 2.2.32 systeminsight, the data retention adheres to that policy, overriding
the default InfluxDB instance retention policy.
Note!
If you set a retention policy at profile level, which does not exist in the InfluxDB instances in place, an error message is
thrown.
For further information on configuring System Insight with multiple InfluxDB instances, see 3.3 Configuring System Insight with
Multiple InfluxDB Instances.
18
3.3 Configuring System Insight with Multiple InfluxDB Instances
Note!
This section only applies if you are using System Insight with InfluxDB for data storage.
If you require multiple InfluxDB databases which can also be written to as a back up, you can add additional databases to the
System Insight service configuration as follows:
backupDBs="influxdb2"
influxdb2 {
database=mz
password=dr
url="http://127.0.0.1:18086"
user=mzadmin
}
For information on how to modify the System Insight service configuration using the mzsh topo command, see 2.7.2 Updating
Service Configurations.
You can specify more than one database using a comma as a separator, as shown in the example below:
system-insight {
si-instance {
config {
akka-cluster=si
influxdb {
database=mz
password=dr
url="http://127.0.0.1:8086"
user=mzadmin
}
backupDBs="influxdb2,influxdb3"
influxdb2 {
database=mz
password=dr
url="http://127.0.0.1:18086"
user=mzadmin
}
influxdb3 {
database=mz
password=dr
url="http://127.0.0.1:28086"
user=mzadmin
}
}
start-after=[
"akka/si"
]
template="1/standard/basic"
}
}
If all InfluxDB instances are unreachable, System Insight runs in gated mode. In gated mode, metrics are dropped until at least one
InfluxDB instance is reachable.
19
3.4 System Insight Properties
In the standard template.conf file for the system insight service, there are a number of properties, which are described in the
table below:
Property Description
It is recommended that you keep the default values for these properties, but if you require to modify any value(s), use the mzsh
topo command, and then restart the SCs and the system insight service. See the example below:
20
Example - Setting a property for the System Insight service
If you want to change the value for the property measurement.server.throttle.interval to 10:
21
4. Managing System Insight
System Insight displays metrics based on the filters that you create. You add the filters to a profile, and determine the metrics that
you want to display by configuring the filters. This process is described in 4.1 Displaying Metrics using System Insight.
After you have specified filters, you use a visualization tool to display metrics. For the purpose of demonstrating how you can use
System Insight, sample Grafana dashboards are provided. See 4.2 Grafana Dashboards.
For information on how to create a System Insight filter, see 4.1.1 Managing System Insight Filters. For information on the naming
conventions in place for metrics, see 4.1.2 Metrics Naming Conventions.
System Metrics
The system metrics are host.compute, host.network, host.storage.iostats, host.storage.usage,
host.storage.swap, pico.events, pico.jvm, pico.workflow, service.akka.router.receiver ,
service.akka.router.sender and service.systeminsight.dispatcher . The respective default fields in place for
these metrics are listed below. You must create a System Insight profile to see these system metrics, see 2.2.32 systeminsight.
Note!
Depending on the filesystem access privileges that you have, the metrics might not be reported for some filesystems.
host.compute
22
Field Description
up_time The amount of time in seconds that has passed since the
machine started
host.network
Field Description
23
host.storage.iostats
Field Description
host.storage.swap
Field Description
host.storage.usage
Field Description
available_space The total amount of free space available for use on the
filesystem in 1024-byte units
pico.events
pico.jvm
24
Field Description
collection_count_per_gc The approximate time spent per garbage collection since the
last measurement
collection_time The approximate time (in milliseconds) that has elapsed for
accumulated garbage collection
committed_memory The amount of memory in bytes that is committed for the JVM
to use
cpu_time_percent The CPU used (in percent) since the last measurement by
the process on which the JVM is running
loaded_file_count The number of classes that are currently loaded in the JVM
Any other fields for pico.jvm report memory pool metrics, which are an estimate of the memory usage in bytes of each memory
pool in the JVM.
pico.workflow
Field Description
service.akka.router.receiver
Field Description
service.akka.router.sender
25
Field Description
service.systeminsight.dispatcher
Field Description
custom The number of metrics of the category custom that are handled
host The number of metrics of the category host that are handled
mim The number of metrics of the category mim that are handled
pico The number of metrics of the category pico that are handled
service The number of metrics of the category service that are handled
Note!
If you have activated InfluxDB, the metric service.systeminsight.mediator is also listed. This metric has the
default tags actor_system , host_system and pico_instance.
service.systeminsight.mediator
Field Description
custom The number of metrics of the category custom that are handled
host The number of metrics of the category host that are handled
mim The number of metrics of the category mim that are handled
pico The number of metrics of the category pico that are handled
service The number of metrics of the category service that are handled
26
Metric Default Tags
pico.events The tags depend on the events that are run on the pico.
Tags Description
When you configure filters, you use regexp syntax to name the metrics that you want to visualize. For the metrics naming
conventions, see 4.1.2 Metrics Naming Conventions.
27
systeminsight Command
Use the mzsh systeminsight command to manage System Insight metrics, by adding and removing filters for the metrics that
you want to produce. You can also use the command to list the metrics available on the running system on which you can apply
filters, to list the retention policies in place, and test which filters there are for a metric.
For details on how to use the systeminsight command, its subcommands, and options, see 2.2.32 systeminsight in the
Command Line Tool documentation.
The System Insight profile allows you to create, edit or remove profiles and filters that you want to use to display or store statistics
using the system insight service.
The System Insight profile consists of two tabs: Filters and Detected Metrics.
In the Filters tab, you can add filters to a profile. In the Detected Metrics tab, the possible metrics, tags and tag values detected for
your setup since you started the system insight service are displayed to help you create a filter that you can then add to the filters
in the Filters tab.
For details on how to use the System Insight profile, see 9.67.2 System Insight Profile.
All metrics names are adjusted to lower case. Spaces in names are replaced with an underscore, for example "UDR Count"
becomes "udr_count".
The name of a metric begins with the category, of which there are five: host, pico, service, custom and mim. Each category
can be further defined with a subcategory, which is then followed by the name:
<category>.<subcategory.subcategory>.metric_name
pico
The pico category, has one possible subcategory, which is jvm. For example, pico.jvm.metaspace_usage,
pico.jvm.thread_count
host
service
The service category must be further defined by which service you want a metric to be shown: service.<service name>, e
g service.kafka
28
custom
Custom metrics are defined with a user specified name combined with the custom prefix: custom.<user_defined> e g
custom.pcrf.policy_requests, custom.airline.fuel_consumption
This naming convention is used for metrics produced using the System Insight forwarding agent.
mim
For a metric on the Outbound UDRs MIM value for an Analysis agent (a processing agent) in a real-time workflow, the metric name
is: mim.realtime.processing.analysis.outbound_udrs
For a metric on the Inbound UDRs MIM value for an ECS collection agent in a batch workflow, the metric name is
mim.batch.collection.ecs.inbound_udrs
The minimum specification for a mim metric is mim.batch.workflow or mim.realtime.workflow. These metrics names
would provide output on all the MIM values generated in all the batch workflows or real-time workflows respectively.
If you installed System insight manually, and have your own installation of Grafana, you can import the dashboards provided in the
directory $MZ_HOME/scripts/str-templates/system-insight/dashboards into your Grafana installation.
Dashboards Design
The sample dashboards rely heavily on the Grafana concept of Templates to provide filtering possibilities to narrow down the
scope of the data displayed. Examples of this are templates to enable filtering on service, pico instance or workflow name. For
further information on Grafana Templates, see http://docs.grafana.org/reference/templating/.
Another feature frequently used is the dashboards is Repeat Row/Repeat Panel, where it is possible to design a row or panel and
then reuse it by replicating it using a Template. An example of this can be seen in the Hosts dashboard where the rows are
repeated once per server selected in the Server template.
All graphs and panels provide a tool tip with a summary on the intent of the graph, from which data it is derived, and if any specific
configurations have been done for the display.
Sample Dashboards
Six sample dashboards are provided, and each graph and panel has tooltips which provide information to help you determine how
you want to customize the view of the graphs and panels for your requirements. To display a tooltip, hover your cursor over the i at
the top left hand corner of each table and graph.
Overview
29
The Overview dashboard
This dashboard provides an overview of MediationZone focusing on high level statistics. Using templates, you can filter on server,
pico type and pico instance. The dashboard includes the following graphs and panels:
Platform uptime
Pico uptimes
Throughput per execution context
CPU usage per host
JVM Memory usage
Network I/O per host
Storage I/O per host
Host
The Host dashboard provides data for the servers hosting MediationZone with regards to CPU and storage utilization. Using
templates, you can filter on servers and mount directories. The dashboard includes the following graphs and panels:
CPU Utilization
CPU Over Time
Server Uptime
Pico Uptimes
Swap Space Usage
Swap Activity
Disk Usage
Disk I/O
Network
30
The Network dashboard
The Network dashboard provides I/O information on the network interfaces of the servers running MediationZone. Using templates,
you can filter on server and network interfaces. The dashboard includes the following graphs and panels:
Traffic (bytes)
Traffic (packets)
Packets dropped and errors
Network statistics on <host name>
Pico
The Pico dashboard provides pico related data with focus on JVM details, e g, uptime and garbage collections. Using templates, it
provides filtering options as well as the option to specify an interval for garbage collection details. The dashboard includes the
following graphs and panels:
Pico Uptime
Garbage Collection last 5m
31
Average Duration of Garbage Collections last 5m
Memory Usage
Active Threads
Trends
The Trends dashboard provides a comparison between high and low resolution data to see trends in CPU and JVM memory
utilization. Using templates, you can filter on server, pico type and pico instance. The dashboard includes the following graphs and
panels:
Workflows
The Workflows dashboard provides basic information about running workflows with focus on throughput. Using templates, it
provides filter options on Execution Context and workflow details. The dashboard includes the following graphs and panels:
32
4.3 Using System Insight for Batch Workflows
Note!
This section only applies if you are using System Insight with InfluxDB for data storage.
In batch workflows, metrics are not published to InfluxDB after every batch. An aggregation task which runs every 10 seconds
aggregates all the batch results generated up to that point and sends them to InfluxDB.
Only metrics which are a Number are aggregated. Creating metrics from MIMs that are instances of a String, timestamp etc is not
applicable, and these are not aggregated nor sent to InfluxDB.
This means that individual batches cannot be tracked using System Insight. If you require to track individual batches, use Audit:
see 8.1 Audit Profile. System Insight provides aggregate metrics only.
If you want the throughput to be determined by the number of UDRs decoded within the batch duration period, you can
use the Outbound UDRs MIM parameter as a metric and divide it by the batch_duration to get the throughput in UDRs
per second. In this example the metric name is mim.batch.processing.decoder.outbound_udrs.
If you want the throughput to be determined by the number of bytes encoded within the batch duration period, you can
use the Outbound Bytes MIM parameter as a metric and divide it by the batch_duration to get the throughput in bytes per
second. In this example the metric name is mim.batch.processing.encoder.outbound_bytes.
33
4.4 REST APIs for System Insight
You can also use REST API for System Insight when using InfluxDB and/or Grafana.
Use the following command to query a measurement for the latest 10 rows:
34
5. System Insight Example
This section provides an example on how you can use System Insight to display throughput in a MediationZone workflow. The
example shows the stages required to use System Insight to display the throughput of a workflow in a Grafana dashboard.
For the purpose of this example, it is assumed that you have installed System Insight with InfluxDB and Grafana.
Configuration in MediationZone
Example workflow
The workflow includes a System Insight forwarding agent, which means the metrics sent to the System Insight service have the
category of custom which is assigned in the Measurement UDR.
consume {
map<string,string> tags = mapCreate( string, string);
map<string,string> fields = mapCreate( string, string);
PulseUDR mock_data = (PulseUDR)input;
mapSet(tags,"WF", "SI_TEST");
mapSet(fields,"SEQ", (string)mock_data.Sequence);
mapSet(fields, "DATA", baToStr(mock_data.Data, "utf-8"));
coll.tags = tags;
coll.fields = fields;
coll.name = "filter.test";
udrRoute(coll);
}
1. You create a System Insight profile in the Desktop. In the Filters tab, in this case, the profile description is Custom data,
the retention policy of one week is selected, and you select the System Insight Profile Enabled check box.
2.
35
2. You can use the Detected Metrics tab to create a filter that sends all the custom-related data to the System Insight service.
The filter created is custom\..*, which is listed in the Filter tab when you click Create Filter.
The data is sent to InfluxDB and you can visualize the data throughput in Grafana.
Visualization in Grafana
To visualize the workflow throughput in Grafana, you add a dashboard with a graph component as follows:
1. You go to your instance of Grafana, and click the logo to the top left. Select Dashboards, then + New.
3. You click the panel title on the graph and then select Edit.
4. You can add metrics based on those available in InfluxDB. To create the query that Grafana will use to plot on, click the
three dash button to the right; by clicking Toggle Edit Mode, you can select to edit by entering free text, or by selecting
values from the drop boxes.
5. In this example, mims are enabled. To see what the forwarding agent's inbound udr throughput is, create the query
according to the image below:
In this instance, the retention policy is set to one_week. The field selected is SEQ, which is an incremented value. Using
the function last retrieves the last value to be plotted in a graph.
6. If everything is ok, the data is visible in the graph. You can modify the graphs to update the interval to show the latest 5
minutes and update every 5 seconds to get continuous data in the graph.
This provides you with a visualization of the SEQ field into the System Insight forwarding agent, so you can keep track of
how many increments that have been produced in a workflow:
36
Grafana dashboard example
37
6. System Insight Backup and Maintenance
The instances of InfluxDB and Grafana which are provided with MediationZone are not highly available. This means that you must
take certain steps to secure file storage, dashboards created and metrics data.
InfluxDB
If you use the embedded setup of System Insight provided with MediationZone, only the metrics data model used internally by
MediationZone can be hosted on the embedded instance of InfluxDB. External writes or queries are not supported. To prevent the
loss of metrics data, you are required to store the InfluxDB database on secure file storage, i e, file storage that can be replicated
or backed up. One option is to have multiple InfluxDB instances, see 3.3 Configuring System Insight with Multiple InfluxDB
Instances. In addition you must monitor the InfluxDB instance, for example, to ensure that the disk does not become full.
Grafana
The instance of Grafana provided with MediationZone is only supported when connected to the embedded InfluxDB
instance. Grafana stores the dashboards created and user data to disk, by default via sqllite3, for further information see
'database' in http://docs.grafana.org/installation/configuration/. Instead of sqllite3, you can use an external PostgreSQL or MySQL
database which you then must support and maintain. To prevent the loss of data and the dashboards created, the disk to which
dashboards and data are stored must be secured, i e replicated or backed up.
InfluxDB
If you use your own instance of InfluxDB with System Insight, it can be used to store and query any metrics, not only those that
originated from MediationZone. If this is the case, you must back up your instance of InfluxDB as recommended by InfluxData,
see https://docs.influxdata.com/influxdb/v1.2/.
System Insight can be used with the following versions of InfluxDB: InfluxDB version 1.x, InfluxCloud version 1.x, InfluxEnterprise
version 1.x. For support of these versions, contact InfluxData.
Grafana
If you use your own version of Grafana, it can be connected to any external metrics source.
System Insight can be used with any version of Grafana that is compatible with the InfluxDB version being used. For support of any
other version of Grafana (i e not the embedded version of Grafana 4.3.1), contact Grafana.
38
39