IT@Home: Monitoring QNAP NAS with Nagios

Once I had a Nagios server set up running on my network, I was trying to figure what would be a good way to monitor the status of my QNAP TS-209 Pro NAS box. It seems there are two possibilities – one is to use the NRPE and Nagios plugins port available on Optware IPKG and the other is to go down the SNMP route using the net-snmp package available as a part of the same Optware system, and use it in combination with check_snmp Nagios plugin. Here is what I ended up doing.

nagios03

Install Optware IPKG from the QPKG bundles

First I began by installing Optware IPKG from the QPKG software bundles available from the QNAP Turbo Station System administration page. More information about Optware packaging system and the supported software can be found here.

Optware installations typically happen under the /opt top level directory. I used the simple automated install approach from the system admin page of my QNAP TS -209 NAS device. More on Optware IPKG installation on QNAP can be found here.

Install net-snmp

Many of the Optware packages run fine on QNAP firmware . But not all do. Unfortunately NRPE seems to be one of them. The problem seems to be with the startup script and it is probably an easy fix. But I decided to go with net-snmp instead and installed net-snmp using ipkg . For this, one needs to log into the QNAP device via ssh using the admin account and run ipkg .

With net-snmp there are two things that are relevant here. The startup script and the snmpd.conf configuration file .

Net-snmp startup

The startup script can be found in /opt/etc/init.d/S70net-snmp with the following content

#!/bin/sh
#
# $Header$
#
if [ -n "`pidof snmpd`" ] ; then
killall  snmpd 2>/dev/null
 fi
sleep 2
/opt/sbin /snmpd -c /opt/etc/snmpd.conf

But this does not get started automatically by default. I decided to go the autorun.sh based approach to get snmpd daemon started. To do this I needed to do the following after I logged into the admin account via ssh.

prompt> mount /dev/mtdblock5 -t ext2 /tmp/config
prompt> cd /tmp/config
prompt> vi autorun.sh (it may not already exist)

I then added the following to the file. My TS-209 has two disk drives that are mounted under /share/HDA_DATA and /share/HDB_DATA, and Optware was installed under /share/HDA_DATA/optware/opt

rm -rf /opt
ln -sf /share/HDA_DATA/optware/opt /opt
export PATH='/opt/bin:/opt/sbin:${PATH}'
/opt/sbin/snmpd -c /opt/etc/snmpd.conf

Then quit vi and unmount the ram disk.

prompt> chmod 755 autorun.sh
prompt> cd /
prompt> umount /dev/mtdblock5

More information for auto starting executables can be found in the IPKG modz QNAP forum. Next time the NAS device is rebooted snmpd daemon should be up and running.

Net-snmp Configuration

The next item is the default snmp configuration file that is installed by net-snmp in /opt/etc/snmpd.conf Most of the file is self explanatory and I had to do only minor tweaks. If one is reasonably comfortable with net-snmp then I guess one can tweak it more. By default the configuration provides some customization points for various parameters of the device. These were good enough for my tweaks.

The basic idea of SNMP is that the SNMP agent uses a system state information database represented as a tree of objects, often referred to as the MIB (Management Information base) tree. There are different MIB tree definitions for different devices, vendors and domains. So it is important to know which tree(s) the SNMP daemon on the device is going to use to report the device information.

The following section in snmpd.conf for example allows me to add processes that I may want to monitor. I did not really change any of the default settings here. The important thing to note here, for those not so familiar with MIBs, is the MIB tree that will be used. It is the UC Davis SNMP MIB tree and the prTable section of the tree will be used. These allowed me to find the exact MIB objects to monitor via Nagios. If you one is not a SNMP expert one will need to use this kind of information to find the relevant MIB objects through a bit of poking around.

###############################################################
# SECTION: Monitor Various Aspects of the Running Host
#
#   The following check up on various aspects of a host.
# proc: Check for processes that should be running.
#     proc NAME [MAX=0] [MIN=0]
#
#     NAME:  the name of the process to check for.  It must match
#            exactly (ie, http will not find httpd processes).
#     MAX:   the maximum number allowed to be running.  Defaults to 0.
#     MIN:   the minimum number to be running.  Defaults to 0.
#
#   The results are reported in the prTable section of the UCD-SNMP-MIB tree
#   Special Case:  When the min and max numbers are both 0, it assumes
#   you want a max of infinity and a min of 1.

proc  smbd 30 1
proc  nmbd 1 1
proc  sshd 30 1
proc  syslogd 1 1
proc  klogd 1 1
proc  USB_Detect 2 2

The next section allows me to specify how I want the disks to be monitored. I have two disks on my TS-209 box and I want to monitor them using their mount points /share/HDA_DATA, /share/HDB_DATA as seen below.I set them up so that if the free space falls below 5% the error flag of the corresponding MIB objct will be set. These also use the UC Davis MIB Tree.

# disk: Check for disk space usage of a partition.
#   The agent can check the amount of available disk space, and make
#   sure it is above a set limit.
#
#    disk PATH [MIN=100000]
#
#    PATH:  mount path to the disk in question.
#    MIN:   Disks with space below this value will have the Mib's errorFlag set.
#           Can be a raw byte value or a percentage followed by the %
#           symbol.  Default value = 100000.
#
#   The results are reported in the dskTable section of the UCD-SNMP-MIB tree
disk  / 5%
disk  /share/HDA_DATA 5%
disk  /share/HDB_DATA 5%

The following is for tracking load on the device and I simply used the default setting.

# load: Check for unreasonable load average values.
# Watch the load average levels on the machine.
#
#    load [1MAX=12.0] [5MAX=12.0] [15MAX=12.0]
#
#    1MAX:   If the 1 minute load average is above this limit at query
#            time, the errorFlag will be set.
#    5MAX:   Similar, but for 5 min average.
#    15MAX:  Similar, but for 15 min average.
#
#   The results are reported in the laTable section of the UCD-SNMP-MIB tree
load  10 8 5

The following is an example of parameter that belongs to a a different MIB tree – the MIB-2 tree.

###########################################################
# SECTION: System Information Setup
#
#   This section defines some of the information reported in
#   the "system" mib group in the mibII tree.
#
# syslocation: The [typically physical] location of the system.
#   Note that setting this value here means that when trying to
#   perform an snmp SET operation to the sysLocation.0 variable will make
#   the agent return the "notWritable" error code.  IE, including
#   this token in the snmpd.conf file will disable write access to
#   the variable.
#   arguments:  location_string
#
syslocation  "2nd FLOOR"

Once net-snmp is setup and started the challenge is in identifying the MIB OIDs (Object Identifiers) or names to target the right Object parameters for Nagios. If one is not very familiar with MIB objects then I found the use of a MIB browser tool like mbrowse (on Linux) and the snmpwalk command line utility are useful.

prompt> snmpwalk -v 2c -c public <host>

spits out a huge list of MIB Object OIDs and value. Output should be directed to a file which then can be used to look for OIDs and parameters of interest. But tools like mbrowse are much easier to use and there are many of them for Linux and Windows environment.

Pointing the mbrowse MIB browser at the QNAP box with the appropriate read only community name (default is ‘public‘) it allowed me to browser the MIB objects that are being by the snmp daemon on the device. I think mbrowse is able to monitor MIB-2 and UCD MIB out of the box. There was no need to install additional MIB tree definitions, if I remember correctly. Sometimes it is necessary to make the MIB browser tool aware of new MIB tree definitions.

Once connection succeeded, expanding the displayed tree and doing a GET or WALK on a node on the tree tells me whether the corresponding MIB object data is available from the device or not. If it is, then the MIB object name, parameters, along with this current status is listed at the bottom panel. This is where knowing part of the name of the node or its parent can be useful for the search functionality available in mbrowse. if some object in the displayed tree is not supported by the target device it will be obvious while doing a GET or WALK.

It took a bit of hunting to track down the MIB Objects for the target system states I defined in the the snmpd.conf file. I have added some screenshots showing some of these MIB objects as identified in the mbrowse window.

prTable Entries

diskPercent Usage

sysUpTime

Once I was able to track down the names/OIDs and parameter names for the MIB Objects that I wanted to monitor I created the service definitions in the Nagios object configuration file (on Nagios server host) I am using to monitor the NAS device.

While trying to figure out the correct OIDs I found the command line use of check_snmp Nagios plugin on the Nagios server host box to be useful instead of restarting server repeatedly. The command line utility snmpget is also a good alternative. This is used by check_snmp plugin anyway.

Nagios Service objects definitions on server machine

Following are used to track the disk % used for 1st and 2nd disks. using the last part of the names of the MIB objects are sufficient. I posted this on QNAP forum earlier, when I setting things up.

define service{
use generic-service ; Inherit values from a template
host_name QNAP01
name Disk 1
service_description Disk 1 Utilization
check_command check_snmp!-P 2c -l Disk -u % -w 90 -c 95 -o dskPercent.1
normal_check_interval 720
retry_check_interval 15
}


define service{
use generic-service ; Inherit values from a template
host_name QNAP01
name Disk 2
service_description Disk 2 Utilization
check_command check_snmp!-P 2c -l Disk -u % -w 70 -c 85 -o dskPercent.2
normal_check_interval 720
retry_check_interval 15
}

Using this to see if the network port is working ok

define service{
use generic-service ; Inherit values from a template
host_name QNAP01
service_description Network Port Status
normal_check_interval 5
check_command check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB
}

Using this to monitor the system uptime. I am using the full OID in this case.

define service{
use generic-service ; Inherit values from a template
host_name QNAP01
service_description Uptime
normal_check_interval 5
check_command check_snmp!-P 2c -l Uptime -o .1.3.6.1.2.1.1.3.sysUpTimeInstance
}

Using this to monitor CPU load Average using three MIB object OIDs.

define service{
use generic-service ; Inherit values from a template
host_name QNAP01
service_description Load Average over 1 5 15 min
normal_check_interval 5
check_command check_snmp!-P 2c -l LoadAvg -o .1.3.6.1.4.1.2021.10.1.3.1,.1.3.6.1.4.1.202
1.10.1.3.2,.1.3.6.1.4.1.2021.10.1.3.3
}

Using this for checking CPU load Average Error Flag (0 is ok)

define service{
use generic-service ; Inherit values from a template
host_name QNAP01
service_description Load Average Error Flags
normal_check_interval 5
check_command check_snmp!-P 2c -l LAErrFlags -r0 -o .1.3.6.1.4.1.2021.10.1.100.1,.1.3.6.
1.4.1.2021.10.1.100.2,.1.3.6.1.4.1.2021.10.1.100.3
}

Using this for Checking Real Memory (Outside range is ok)

define service{
use generic-service ; Inherit values from a template
host_name QNAP01
service_description Real Memory Availability
normal_check_interval 5
check_command check_snmp!-P 2c -l Memory -u KB -w 1000:500 -c 500:0 -o memAvailReal.0
}

The Nagios plugin manual describes the exact syntax and meaning of parameters for the check_snmp plugin. Following is a screenshot of the parameters collected by Nagios.

Advertisements

1 Comment

Filed under IT management

One response to “IT@Home: Monitoring QNAP NAS with Nagios

  1. Any

    For QNAP TS439 “1.3.6.1.2.1.1.3.sysUpTimeInstance” gives wrong numbers.
    I used the script following Perl-scipt:
    #!/usr/bin/perl -w
    use strict;
    use Net::SNMP;

    my $uptimeOID = ‘1.3.6.1.2.1.25.1.1.0’;

    foreach my $host (@ARGV)
    {
    my ($session, $error) = Net::SNMP->session(
    -hostname => $host,
    -community => ‘public’,
    -port=> 161
    );

    warn (“ERROR for $host: $error\n”) unless (defined($session));

    my $result = $session->get_request(
    -varbindlist => [$uptimeOID]
    );

    if (!defined($result))
    {
    warn (“ERROR: ” . $session->error . “\n”);
    }
    else
    {
    printf(“Uptime for %s: %s\n”,$host, $result->{$uptimeOID});
    }

    $session->close;
    }

    exit 0;

    Have fun!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s