Friday, July 2, 2010

Checking Windows Clusters / Cluster Resources Using Existing Check Plugins for Nagios

Now we all know when it comes to clustering its basically one up and the other down. So monitoring, say Exchange or SQL on the live or active one is just fine with all the disk checked and all the services are UP (while set to manual). Problem is when it comes to the passive server, how do we check for just about anything?

Simple. Try using this custom script i’ve written.

Here’s the script snippet.

#!/bin/bash

#GET VALUES FROM NAGIOS AND SET THEM AS VARIABLES

hostname=$1
servicename="$2"

# Debug
# hostname=10.252.182.169
# servicename="SQL Server (YOG)"

# We check if the quorum is running or not first
# assuming quorum is on Q drive

getoutput=`/usr/local/nagios/libexec/check_nrpe -H $hostname  -t 50 -c CheckDriveSize -a ShowAll=long MinWarnFree=10% MinCritFree=5% Drive=q: | grep -c "The system cannot find the path specified"`

# DEBUG
# echo $getoutput

if [[ "$getoutput" == "1" ]]; then

    echo "OK: Clustered service is not failed over. We will not check this service now."
    exit 0

else

    /usr/local/nagios/libexec/check_nrpe -H $hostname -t 50 -c checkServiceState -a ShowAll "$servicename"

fi

The idea of this script is to:

1) Check certain resources on the Cluster only if the quorum is alive!

2) If it’s not, send an OK value back to Nagios but in the comment section, mention that “its not failed over” or something.

3) Send fake performance data not to break the Performance Data values.

This value,

getoutput=`/usr/local/nagios/libexec/check_nrpe -H $hostname  -t 50 -c CheckDriveSize -a ShowAll=long MinWarnFree=10% MinCritFree=5% Drive=q: | grep -c "The system cannot find the path specified"`

This above checks if the quorum is alive. It is normally the first resource that will fail over. In our case it was drive q!. So, when we are satisfied that the output of the plugin isn’t “…cannot find path”…only then we execute just about any plugin. And in this example above, check the services that are cluster monitored/managed.

Now, to execute the script on commands.cfg, is something like this.

# Check Clustered Services:
# ---------------------------------------------------------------------

define command{
        command_name    check_floating_services
        command_line    $USER1$/check_floating_services $HOSTADDRESS$ $ARG1$

        }   

On command line, you can simple parse

./check_floating_services <hostname/ip> <servicename>

And there you have it, cluster “enable” plugins.

Cheers!!!!