IT@Home: Keeping an Eye on Things with Nagios

nagioslogo1For a while I wanted to have a network management console to rule over my home PC dominion. It would be nice to be able to monitor all the status of scheduled jobs on my various PCs. Having used things like HP Openview, etc. in the past, I always wanted to set up something similar at home. So I finally started to investigate it few months ago and I was surprised to find the range of Open Source options available to home users like me. This wikipedia page has a nice comparison of options that are out there. Most of the open source ones seem to aim for Linux server deployments with monitoring agents available for windows and other OSs. The architecture for most of these solutions consist of a network monitoring server and bunch of agents or slave servers sitting on different boxes.

So having a Linux box in one’s home network broadens one’s options. Then the other part of the decision process for me was to figure out what kind of monitoring can I have on the client boxes. My client pool consists of a few Windows 2000 and XP boxes, a QNAP NAS, and a Linksys WRT54GL running Tomato. SNMP monitoring would seem to cover all of them at one sweep. But I think the traditional SNMP based monitoring is too much of an overkill for home PCs unless one is reasonably familiar with use of SNMP MIB Objects. SNMP based agents can monitor whole plethora of system parameters, most of which do not really make much sense in home situations. Also setting up SNMP monitoring support on Windows boxes require extra work with access to Windows CD! unless it was enabled at install time. I did do it for one of my Windoes 2000 boxes. But ended up not using it. So my setup looks like this picture.

 

 nagios012

Nagios Server Box

In terms of network monitoring servers, Nagios seems to be a popular open source option. It comes with a rage of plugins to collect data from local and remote machines. Remote boxes could also be running Nagios servers in slave mode and pushing data to the main server. But there does not seem to be a Windows port of Nagios. The default list of plugins is a long one including one that can handle SNMP data collection. Then are many more plugins on Nagios Exchange. One plugin that seem to be suited for home PC data gathering is the NRPE implementation which can gather data from Nagios plugins executing on remote boxes. It has a Windows port which can run various scripts on the Windows box to gather appropriate data and send it off to the NRPE plugin running on the Nagios server box. Another alternative is to use NSClient++, which can also talk to a remote Nagios server and push data from the Windows box. To me NRPE_NT, the Windows port of NRPE, seemed to easier to use where I can easily add my own scripts or those written by others to extend the monitoring functionality. (There are also other Nagios server plugins that can make use of WMI support on Windows boxes) .

Installing Nagios and NRPE plugin on my Ubuntu box was fairly straightforward. I did not deploy the NRPE service on the box. I only deployed the check_nrpe plugin for Nagios. Both come with good documentation for building and installing on Linux. Once check_nrpe module was in place I had to create the Nagios service object definitions based on the check_nrpe plugin. I use these to monitor information on my different PCs. These definitions depend on commands that I enable on the remote boxes.

Monitoring Windows boxes

I was able to obtain NRPE_NT from here. NRPE_NT needs to be installed before anything can be tested with it. Installation is done by executing ‘NRPE_NT -i’. It was fairly easy to customize the standard settings in the nrpe.cfg configuration file and I had to add commands depending on what I wanted to be monitored. For monitorign using NRPE_NT there is a fairly long list of NRPE plugins for Windows available here on the Nagios Exchange. These include the usual CPU, memory and disk monitoring plugins. Many of these NRPE plugins are either Windows EXE files or script based. I also needed something that would allow me to determine if a regularly scheduled job has accomplished its task.

I have periodically scheduled tasks such as Syncback SE backup tasks, Acronis drive imaging tasks, Virus scan and update tasks. I would like to know if these tasks ran within their respective intervals or not and what was the outcome of the execution.

So my list of things include the following where I used NRPE plugins available on Nagios Exchange

I use Acronis to image my C: drive, SyncBack jobs to backup various folders, and Antivir and AVG as my scanners. So I also needed something that would let me monitor the following

  • Check whether the  age of Acronis image file is less than the task’s schedule interval
  • Check whether the last image creation log indicated success
  • Check whether the SyncBack Log file for a job is not older than planned schedule and contains success message.
  • Do the above for other backup jobs
  • Do the same for Virus scan log (Assuming it generates a text log, AVG does not)
  • Do the same for Virus update log (Assuming it generates a text log, AVG does not)

I needed a script that could check the age of a file and potentially look for a string in the file. Not finding something quite right I ended up putting together a Windows script based on what others have written. I am not a Windows developer, so this utility probably could use some refactoring. For example it just looks for single occurance of the key phrase. It assumes the target log file rewritten every time (instead of appended), and the target phrase either occurs once or it does not. It also uses the most recently modified file in case multiple matches. But it has been ok for my usage till now.

check_latest_file_for_string.wsf

Based on this here is a command that I use in the NRPE_NT config file to check if the Thunderbird backup job with a log file name “Thunderbird_Log_Page1.txt” is younger than 21 days and contains the “Success” string in it. If the age is older than 21 days it returns error. Otherwise if the string is found and the OK flag is set to true then it treats a positive match as a success and vice versa.

command[check_tbird_backup_log_succ]=c:\winnt\system32\cscript.exe
//NoLogo //T:10 c:\nrpe_nt\plugins\check_latest_file_for_string.wsf
/d:"C:\Logs\SyncBackSE" /f:"Thunderbird"
/s:"Success" /ok:true /n:21

Wishlist

With these set of NRPE_NT executables and commands I have been able to monitor my PCs in a reasonably ok manner. What I cannot yet do is to create and change monitoring tasks from the Nagios UI. That would be really nice.

nagios022

Other Stuff

For the QNAP Box and the Linksys WRT54GL router, both of them can be monitored using the check_snmp plugin of Nagios. Additionalty for the QNAP box I found it was easier to enable net-snmp on it instead of trying to get a Optware port of NRPE working on it. Looking for the right MIB Objects was bit of a challenge, but I eventually got it working.

Advertisements

Leave a comment

Filed under IT management

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s