Yet Another Computing Blog: Nagios

Wednesday, January 9, 2013

Using NAGIOS to Check the Physical Memory Available on a Windows Host

By default the CheckNT command checks the virtual memory on a Windows server. So for example, if your server had 4GB of physical memory and a 4GB page file NAGIOS and CheckNT would see 8GB of physical memory. Getting warnings and critical alerts on this memory space is quite often not very helpful. What we really want to know is do we have enough physical memory available on the server so that the server performs as well as it should.

This is where the NRPE plugins are much better as you can get much more granular when monitoring the memory on a Windows host.

To start with we need to create a new command definition. Add this to your commands.cfg (or equivalent):

 # CheckWindowsPhysical Mem command definition  
 define command {  
         command_name             CheckWindowsPhysicalMem  
         command_line             $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckMEM -a MaxWarn=$ARG1$% MaxCrit=$ARG2$% ShowAll type=physical  
 }

In the above command definition we're using the check_nrpe executable to perform a memory check of the physical memory. The type can be changed to grab just the page file or check the entire virtual memory address space.

Next we need add the physical memory checks by adding a service definition to either your host or service configs (again, depends on how you've structured your NAGIOS configuration).

 # Service definition  
 # Add the service to the service definition  
 define service {  
         service_description          Physical Memory  
         check_command             CheckWindowsPhysicalMem!80!90  
         host_name               << hostname >>  
         event_handler_enabled         0  
         active_checks_enabled         1  
         passive_checks_enabled        0  
         notifications_enabled         1  
         check_freshness            0  
         freshness_threshold          86400  
         use                  << service template >>  
 }

You will need to update the above snippet with the host name you are monitoring and the service template you are using. The !80!90 is the standard warning at 80% usage, critical at 90% usage. These can be varied to suit your host and environment.

Sunday, December 18, 2011

Setup Nagios Monitoring – The Easy Way Part 3

In the first two parts of this guide we’ve installed Nagios 3 onto an Ubuntu server. We’ve restructured the layout of the configuration files so that they are more manageable. In this step we will look at time periods and how to configure them.

Time periods are used just that, a schedule that defines when things should or should not happen. Typical time periods defined in Nagios include:

24x7 – All the time, from 00:00 to 23:59 Monday to Sunday.
Work Hours – 09:00 to 17:00 Monday to Friday.
After Hours – All the time outside of the work hours.
Never – Empty schedule with no times defined.

These time periods are used in a few places. Firstly they can be used to determine when host and service checks occur. For example we may want critical production servers to be monitored 24x7 but only want non critical servers monitored during business hours.

The second major place they are used is to determine when contacts should be alerted that problems have occurred. For example we may send alerts to an administrators email group during business hours but send an alert via SMS after work hours.

In the /etc/nagios3/timeperiods folder we’ll create four different time periods:

/etc/nagios3/timeperiods/24x7.cfg
/etc/nagios3/timeperiods/never.cfg
/etc/nagios3/timeperiods/afterhours.cfg
/etc/nagios3/timeperiods/workhours.cfg

First we’ll start by defining the 24x7 time period. Create the file /etc/nagios3/timeperiods/24x7.cfg as shown below:

# This defines a timeperiod where all times are valid for checks,
# notifications, etc.  The classic "24x7" support nightmare. :-)

define timeperiod{
        timeperiod_name 24x7
        alias           24 Hours A Day, 7 Days A Week
        sunday          00:00-24:00
        monday          00:00-24:00
        tuesday         00:00-24:00
        wednesday       00:00-24:00
        thursday        00:00-24:00
        friday          00:00-24:00
        saturday        00:00-24:00
        }

Next we’ll create the never time period as shown below:

# Here is a slightly friendlier period during work hours
define timeperiod{
        timeperiod_name never
        alias           Never

        }

Now we’ll create the afterhours definition:


# The complement of workhours
define timeperiod{
        timeperiod_name nonworkhours
        alias           Non-Work Hours
        sunday          00:00-24:00
        monday          00:00-09:00,17:00-24:00
        tuesday         00:00-09:00,17:00-24:00
        wednesday       00:00-09:00,17:00-24:00
        thursday        00:00-09:00,17:00-24:00
        friday          00:00-09:00,17:00-24:00
        saturday        00:00-24:00
        }

And finally we’ll create the workhours definition:

# Here is a slightly friendlier period during work hours
define timeperiod{
        timeperiod_name workhours
        alias           Standard Work Hours
        monday          09:00-17:00
        tuesday         09:00-17:00
        wednesday       09:00-17:00
        thursday        09:00-17:00
        friday          09:00-17:00
        }

These four time period definitions should cover most smaller IT shops. If you need another time period definition it’s as simple as creating a new text file in the /etc/nagios3/timeperiods folder and define the time periods accordingly.

So we’ve finished defining the time periods, in the next blog post we’ll look at defining contacts in Nagios which will make use of the time periods we’ve defined here.