Wednesday, January 9, 2013

Using NAGIOS to Check the Physical Memory Available on a Windows Host

By default the CheckNT command checks the virtual memory on a Windows server. So for example, if your server had 4GB of physical memory and a 4GB page file NAGIOS and CheckNT would see 8GB of physical memory. Getting warnings and critical alerts on this memory space is quite often not very helpful. What we really want to know is do we have enough physical memory available on the server so that the server performs as well as it should.

This is where the NRPE plugins are much better as you can get much more granular when monitoring the memory on a Windows host.

To start with we need to create a new command definition. Add this to your commands.cfg (or equivalent):

 # CheckWindowsPhysical Mem command definition  
 define command {  
         command_name             CheckWindowsPhysicalMem  
         command_line             $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckMEM -a MaxWarn=$ARG1$% MaxCrit=$ARG2$% ShowAll type=physical  
 }  

In the above command definition we're using the check_nrpe executable to perform a memory check of the physical memory. The type can be changed to grab just the page file or check the entire virtual memory address space.

Next we need add the physical memory checks by adding a service definition to either your host or service configs (again, depends on how you've structured your NAGIOS configuration).

 # Service definition  
 # Add the service to the service definition  
 define service {  
         service_description          Physical Memory  
         check_command             CheckWindowsPhysicalMem!80!90  
         host_name               << hostname >>  
         event_handler_enabled         0  
         active_checks_enabled         1  
         passive_checks_enabled        0  
         notifications_enabled         1  
         check_freshness            0  
         freshness_threshold          86400  
         use                  << service template >>  
 }  

You will need to update the above snippet with the host name you are monitoring and the service template you are using. The !80!90 is the standard warning at 80% usage, critical at 90% usage. These can be varied to suit your host and environment.








3 comments:

  1. I added the command as you have mentioned but it gives following error:
    COMMAND: /usr/local/nagios/libexec/check_nrpe -H 172.16.56.101 -p 5666 -c CheckMEM -a MaxWarn=80% MaxCrit=90% ShowAll type=physical
    OUTPUT: Could not construct return packet in NRPE handler check client side (nsclient.log) logs...

    When I check client side logs:
    2013-07-23 16:38:21: error:modules\CheckSystem\CheckSystem.cpp:1084: ERROR: Counter not found: \Server\Logon Errors: The specified counter could not be found. (C0000BB9)
    2013-07-23 16:38:21: error:modules\CheckSystem\CheckSystem.cpp:1086: ERROR: Counter not found: \Server\Logon Errors: The specified counter could not be found. (C0000BB9)
    2013-07-23 16:38:21: error:modules\CheckSystem\CheckSystem.cpp:1115: ERROR: \Server\Logon Errors: PdhAddCounter failed: The specified counter could not be found. (C0000BB9) (\Server\Logon Errors|\Server\Logon Errors)

    ReplyDelete
  2. Hi everyone,

    any update/input on this issue?

    Regards,
    Avinash

    ReplyDelete
  3. Hi Avinash,

    What happens when you run the check_nrpe command directly from the Nagios command line? For example:

    /usr/lib/nagios/plugins/check_nrpe -H -p 5666 -c CheckMEM ShowAll type=physical -a MaxWarn=10% MaxCrit=20%

    ReplyDelete