Performance Data


Introduction

Starting with release 0.0.7 you now have the ability to process various types of performance data relating to host and service checks. A description of the different types of performance data, as well as information on how to go about processing that data is described below...

Types of Performance Data

There are two basic categories of performance data that can be obtained from NetSaint:

  1. Check performance data
  2. Plugin performance data

Check performance data is internal data that relates to the actual execution of a host or service check. This might include things like service check latency (i.e. how "late" was the service check from its scheduled execution time) and the number of seconds a host or service check took to execute. This type of performance data is available for all checks that are performed. The $EXECUTIONTIME$ macro can be used to determine the number of seconds a host or service check was running and the $LATENCY$ macro can be used to determine how "late" a service check was (host checks have zero latency, as they are executed on an as-needed basis, rather than at regularly scheduled intervals).

Plugin performance data is external data specific to the plugin used to perform the host or service check. Plugin-specific data can include things like percent packet loss, free disk space, processor load, number of current users, etc. - basically any type of metric that the plugin is measuring when it executes. Plugin-specific performance data is optional and may not be supported by all plugins. As of this writing, no plugins return performance data, although they mostly likely will in the near future. Plugin-specific performance data (if available) can be obtained by using the $PERFDATA$ macro. See below for more information on how plugins can return performance data to NetSaint for inclusion in the $PERFDATA$ macro.

Performance Data Support For Plugins

Normally plugins return a single line of text that indicates the status of some type of measurable data. For example, the check_ping plugin might return a line of text like the following:

PING ok - Packet loss = 0%, RTA = 0.80 ms

With this type of output, the entire line of text is available in the $OUTPUT$ macro.

In order to facilitate the passing of plugin-specific performance data to NetSaint, the plugin specification has been expanded. If a plugin wishes to pass performance data back to NetSaint, it does so by sending the normal text string that it usually would, followed by a pipe character (|), and then a string containing one or more performance data metrics. Let's take the check_ping plugin as an example and assume that it has been enhanced to return percent packet loss and average round trip time as performance data metrics. A sample plugin output might look like this:

PING ok - Packet loss = 0%, RTA = 0.80 ms | percent_packet_loss=0, rta=0.80

When NetSaint seems this format of plugin output it will split the output into two parts: everything before the pipe character is considered to be the "normal" plugin output and everything after the pipe character is considered to be the plugin-specific performance data. The "normal" output gets stored in the $OUTPUT$ macro, while the optional performance data gets stored in the $PERFDATA$ macro. In the example above, the $OUTPUT$ macro would contain "PING ok - Packet loss = 0%, RTA = 0.80 ms" (without quotes) and the $PERFDATA$ macro would contain "percent_packet_loss=0, rta=0.80" (without quotes).

Enabling Performance Data Processing

If you want to process the performance data that is available from NetSaint and the plugins, you'll have to enable the process_performance_data option. You're still going to have to define host and service processing commands (described below), but this global option must be enabled for any performance data processing to take place.

Defining Performance Data Processing Commands

If you want to process host performance data, you need to use the host_perfdata_command option to specify a command that should be run after every host check. The name of the command that you specify in the host_perfdata_command option must be a valid command definition in your host config file. In the command definition, you can use any macros that are valid in host performance processing commands.

An example command definition that simply appends host performance data (last host check time, execution time, performance data, etc.) to a temporary text file is shown below. The various performance data items are written to the file in tab-delimited format.

command[process-host-perfdata]=/bin/echo -e "$LASTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$STATETYPE$\t$EXECUTIONTIME$\t$OUTPUT$\t$PERFDATA$" >> /tmp/host-perfdata

If you want to process service performance data, you need to use the service_perfdata_command option to specify a command that should be run after every service check. The name of the command that you specify in the service_perfdata_command option must be a valid command definition in your host config file. In the command definition, you can use any macros that are valid in service performance processing commands.

An example command definition that simply appends service performance data (last service check time, execution time, check latency, performance data, etc.) to a temporary text file is shown below. The various performance data items are written to the file in tab-delimited format.

command[process-service-perfdata]=/bin/echo -e "$LASTCHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$STATETYPE$\t$EXECUTIONTIME$\t$LATENCY$\t$OUTPUT$\t$PERFDATA$" >> /tmp/service-perfdata

On a site note, if you have a service_perfdata_command defined and you are also obsessing over services, you may way to disable the obsess_over_services option and make your service_perfdata_command do double duty. Since the ocsp_command and service_perfdata_command commands are both executed after every service check, you'll cut out a bit of overhead by consolidating everything into the service_perfdata_command.

Post-Processing Options

I'm assuming that you're going to want to do some post-processing of the performance data that you get out of NetSaint. If not, why are you enabling performance data processing in the first place?

What you do with the performance data once its out of NetSaint is completely up to you. If your processing commands are simply writing performance data to temporary text files, you could setup occassional cron jobs to process all the entries in those text files, squash them using rrdtool, dump them into a database, produce graphs, whatever...