AN38: Monitoring HWg devices in Nagios - Introduction


Czech version

 


English version

 

Introduction to Nagios for novice users who need to monitor sensor outputs (such as temperature/humidity in a server room) but have only limited experience with Nagios.

Nagios [http://www.nagios.org/] is the most popular open-source monitoring tool. Using plugins, it can monitor any devices (called “hosts”) and services using any protocol. The most common scenarios involve monitoring devices over SNMP or monitoring servers using NRPE (Nagios Remote Plugin Executor). Nagios is configured with text configuration files and controlled over a web interface.

 

Nagios works with 3 primary objects (object):

  • service - services to be monitored (e.g. CPU load, toner level in a printer, temperature in the monitored room)
  • host - the devices (hosts) where the monitored services are running (e.g. servers, printers, thermometers, ...)
  • contact - people who receive status notifications about services and hosts (admins, technicians, operators, ...)

These objects can be grouped (group) in order to simplify configuration and get a clearer overview of related objects in the web interface.
There are more objects. The timeperiod objects define when to monitor hosts/services and when to notify contact persons. The actual plugins are defined using the command objects. Further objects for escalating issues and specifying dependencies exceed the scope of this article (for more details, see the Nagios Core Documentation, chapter Object Configuration Overview, available at http://nagios.sourceforge.net/docs/nagios-3.pdf).

Installing Nagios

This description is intended for users with only a basic knowledge of the GNU/Linux operating system. Therefore, installation from the source code is not covered here (if interested, the source package is available at http://www.nagios.org/download/core/, and is installed with the usual ./configure, make all, make install triad).

For novice users, we recommend the Ubuntu Linux distribution [http://www.ubuntu.com/]. The examples in this text assume a standard installation of Ubuntu Server Edition 9.10 [http://www.ubuntu.com/getubuntu/download-server].

Install Nagios using the following command:

helpdesk@monitoring:~$ sudo aptitude install nagios3

The installer automatically selects additional packages that are required by Nagios (Apache web server, SNMP libraries, mail server, and so on). Confirm the installation of these packages. After installation, you will be probably asked to set up Postfix (mailserver). Select Internet Site and enter the server name and domain (fully qualified domain name, such as monitoring.company.com). Then, enter a password for accessing Nagios over the web.

After installation, you can use your web browser to verify that Nagios is running. Open http://192.168.1.1/nagios3/, where 192.168.1.1 is the IP address of the server where Nagios has been installed. The login name is nagiosadmin, the password is the one you specified during installation. If you forgot your password, enter the sudo htpasswd /etc/nagios3/htpasswd.users nagiosadmin command and set a new password.

Tip: You can use the sudo ifconfig command to find out your IP address:

eth0      Link encap:Ethernet  HWaddr 08:00:27:3d:d9:f1
inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe3d:d9f1/64 Scope:Link
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
RX packets:4300 errors:0 dropped:0 overruns:0 frame:0
TX packets:2946 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:444119 (444.1 KB)  TX bytes:1304450 (1.3 MB)
Interrupt:11 Base address:0xc020

If you installed the server with IP address configured over DHCP, and you now need to specify the IP address by hand, modify the /etc/network/interfaces file:

auto eth0
iface eth0 inet static
address 192.168.1.1
netmask 255.255.255.0
gateway 192.168.1.254

Similarly, to set up automatic configuration over DHCP, modify the file as follows:

auto eth0
iface eth0 inet dhcp

To activate changes in the network settings, enter the /etc/init.d/networking restart command.

For easier orientation, we recommend to install Midnight Commander:

helpdesk@monitofanring:~$ sudo aptitude install mc

 

To start Midnight Commander, enter the mc command. By default, the Ubuntu distribution does not have the internal editor selected. To activate it, press F9 → OptionsConfiguration... → check use internal edItSave. To edit a file, press F4. The ESC key works as a command prefix. To end the edit mode, press ESC twice (this behavior comes from the “dumb terminal” support; when a terminal does not support e.g. function keys, the F4 key can be replaced by pressing ESC and 4 in sequence). Press SHIFT+F4 to create a new file. To avoid automatic indenting when pasting a file from the clipboard, press F9 → OptionsGeneral... → uncheck Return does autoindentOK.

Install SSH to allow remote access to the server's terminal:

helpdesk@monitoring:~$ sudo aptitude install openssh-server

Remember that by installing SSH, you enable remote access to the server; therefore, you should use sufficiently strong passwords. To access the server from Windows computers, we recommend the PuTTY program [http://www.chiark.greenend.org.uk/~sgtatham/putty/].

Tip: In Unix-like systems, operations such as installing and running system utilities, changing system settings, etc. can only be performed by the super-user (the root user account). To avoid typing sudo in front of every command when installing or configuring Nagios, you can switch to the super-user mode by entering sudo su -.

helpdesk@monitoring:~$ whoami
helpdesk (logged in as the helpdesk user)
helpdesk@monitoring:~$ sudo su -
[sudo] password for helpdesk: enter the password of the user that is currently logged in
root@monitoring:~# whoami
root (now we're logged in as root, commands will be executed with root privileges)
root@monitoring:~# exit
logout
helpdesk@monitoring:~$

Notice that the super-user prompt ends with the pound (#) sign, while a regular user prompt ends with the dollar ($) sign.

Configuring Nagios

Nagios configuration files are located in the /etc/nagios3 directory. The monitored infrastructure is defined using files in the /etc/nagios3/conf.d directory. To get acquainted with Nagios configuration, let us back up the preconfigured infrastructure settings and create our own configuration.

helpdesk@monitoring:~$ sudo su -
[sudo] password for helpdesk:
root@monitoring:~# mkdir /root/nagios_backup
root@monitoring:~# mv /etc/nagios3/conf.d/* /root/nagios_backup

Time periods - timeperiod

Time periods, during which the monitoring is performed and contact people are notified, are specified in the /etc/nagios3/conf.d/timeperiods.cfg file. Every object is defined with define timeperiod { … }. The timeperiod_name parameter specifies a name that is used to refer to this time period in the hosts, services and contacts configuration. The alias parameter specifies a display name for this time period. Then follows a list of time intervals that specify when is the given time period active. Multiple time intervals can be specified for each day. (For instance, to specify Mondays outside of office hours, enter: monday 00:00-8:00,18:00-24:00).

define timeperiod {
timeperiod_name 24x7
alias           Nonstop 24x7
monday          00:00-24:00
tuesday         00:00-24:00
wednesday       00:00-24:00
thursday        00:00-24:00
friday          00:00-24:00
sunday          00:00-24:00
saturday        00:00-24:00
}

define timeperiod {
timeperiod_name 10x5
alias           Work hours 10x5
monday          08:00-18:00
tuesday         08:00-18:00
wednesday       08:00-18:00
thursday        08:00-18:00
friday          08:00-18:00
}

define timeperiod {
timeperiod_name never
alias           Never
}

This example defines three time periods. The first time period, 24x7, is for non-stop monitoring. The second time period, 10x5, is active on workdays from 8:00 to 18:00. The third time period, never, is never active (use it if you do not want to send any notifications).

Contact persons - contact

People to be notified are configured in the /etc/nagios3/conf.d/contacts.cfg file. The host_ and service_ notification_period parameters specify when should the contact receive notifications. The host_notification_options parameter specifies what types of notifications shall be sent to that contact:

    • down (switched off)
    • unreachable
    • recovery (return to normal)
    • flapping (continuously changing states)
    • scheduled downtime (start and end of planned maintenance)
    • none (no messages)

    The service_notification_options parameter specifies what types of messages shall be sent to this contact:

    • warning
    • unknown (unknown state)
    • critical (critical state)
    • recovery (return to normal operation)
    • flapping (service continuously changes states)
    • none (no messages)

    The host_ and service_ notification_commands specify the command to be run in order to send a notification of a host or service event. The email and address1 parameters are passed as arguments to the commands.

    define contact {
    contact_name                    helpdesk
    alias                           Company Helpdesk
    host_notification_period        24x7
    service_notification_period     24x7
    host_notification_options       d,u,r
    service_notification_options    u,w,c,r
    service_notification_commands   notify-service-by-email
    host_notification_commands      notify-host-by-email
    email                           helpdesk@company.com
    }

    define contact {
    contact_name                    technician1_mail
    alias                           John Doe
    host_notification_period        10x5
    service_notification_period     10x5
    host_notification_options       n
    service_notification_options    w,c,r
    service_notification_commands   notify-service-by-email
    host_notification_commands      notify-host-by-email
    email                           john.doe@company.com
    }

    This defines three contacts. The first contact definition, helpdesk, e-mails state change notifications for hosts and services to helpdesk@company.com at all times. The second contact definition, technician1_mail, e-mails state change notifications for services during work hours (see timeperiod 10x5) to john.doe@company.com.

    Contact groups - contactgroup

    To avoid listing all contact persons for every host and service, you can define contact groups in the /etc/nagios3/conf.d/contactgroups.cfg file.

    define contactgroup {
    contactgroup_name       support
    alias                   Company Support
    members                 helpdesk, technician1_mail
    }

    Notifications for hosts and services with the support contact group specified will be sent to helpdesk and technician1_mail.

    Hosts - host

    In order to monitor a service or a sensor reading, we need to define the host (device) where it is available. Usually, the host is the IP address of the device. Before adding individual hosts to Nagios, let us prepare a template with common monitoring parameters to avoid repetitive typing. The name is used to refer to this particular host (or template, in this case) in other parts of the configuration. The notification_interval parameter defines how often are notifications sent if a host remains unavailable. The notification_period and check_period parameters specify the timing of notifications or availability checks. The normal_check_interval parameter defines how frequently to check a host. If the host becomes unavailable, it is checked every retry_check_interval minutes and up to max_check_attempts times. The method of checking host availability is defined by the check_command parameter. The contact_groups parameter defines the contact groups to notify. The register 0 parameter indicates that this is a template and not an actual host. Store the template to the /etc/nagios3/conf.d/tmplates.cfg file.

    define host {
    name                            standard-host
    notifications_enabled           1
    notification_interval           0
    notification_period             24x7
    notification_options            d,u,r
    check_period                    24x7
    normal_check_interval           5
    retry_check_interval            1
    max_check_attempts              10
    first_notification_delay        10
    check_command                   check-host-alive
    contact_groups                  support
    register                       0
    }

    This defines the standard-host template. Unavailable host notifications are sent only once. Notifications and checks are active at all times. Host availability is checked every 5 minutes. If a host becomes unavailable, its availability is checked every minute for 10 minutes, and then it is again checked every 5 minutes. If the host becomes available again within 10 minutes after the failure (first_notification_delay parameter), unavailability notification is not sent. The host is checked using the check-host-alive command, that is, using the ICMP protocol (ping). Notifications are sent to the support group.

    Now we can define the hosts that we want to be monitored. To begin, let us specify the Nagios server itself. The use parameter specifies which template to use for common host parameters. Then, specify a name for the host using the host_name parameter. The name is used to refer to this host in the rest of this configuration. The address parameter defines the IP address for accessing the host. The host icon is defined with the icon_image parameter (icons are located in the /usr/share/nagios/htdocs/images/logos/ directory). Store the configuration to the /etc/nagios3/conf.d/localhost.cfg file.

    define host {
    use             standard-host
    host_name       localhost
    alias           Nagios server
    address         127.0.0.1
    icon_image      base/linux40.png
    }

    The Nagios server uses the parameters in the standard-host template. Services running on this server will refer to the localhost name. The server is available at the loopback interface (IP 127.0.0.1). The Linux logo is used as the icon.

    Services - service

    Now we need to define individual services or values to be monitored at the host (device). Services have similar parameters to hosts (host). Therefore, we set up a template first to avoid typing the same general parameters for every service. The service template should be added to the /etc/nagios3/conf.d/tmplates.cfg file.

    define service {
    name                            standard-service
    notifications_enabled           1
    notification_interval           0
    notification_period             24x7
    notification_options            u,w,c,r
    check_period                    24x7
    normal_check_interval           5
    retry_check_interval            1
    max_check_attempts              4
    first_notification_delay        5
    contact_groups                  support
    register                        0
    }  

    The only difference between the host template and the service template is the shorter interval for testing a failing service. Again, remember to add the register 0 parameter to avoid registering the template as a regular service.

    Now we can add services that should be monitored at the server. Again, specify the use parameter to use a template with common parameters. Every service is tied to a host using the host_name parameter. The key parameter is the check_command that defines the plugin for monitoring the service and its arguments. Individual arguments are separated with the exclamation mark (“!”). Add the following two services to the /etc/nagios3/conf.d/tmplates.cfg file:

    define service {
    use                     standard-service
    host_name               localhost
    service_description     Disk free
    check_command           check_all_disks!10%!5%
    }

    define service {
    use                     standard-service
    host_name               localhost
    service_description     System load
    check_command           check_load!5.0!4.0!3.0!10.0!6.0!4.0
    }

     

    These definitions monitor free disk space (Disk free) and server load (System Load). Free disk space is monitored with the check_all_disks command that takes two arguments. When the disk is 90 % full (10 % free space), a warning event is issued. Upon reaching 95 % of the disk capacity, a critical event is issued. These events are sent to the members of the support group (specified in the standard-service template that is included by the use parameter).

    Activating configuration changes in Nagios

    Nagios reads its configuration files upon startup. To activate the changes, Nagios needs to be reloaded with the following command:

    root@monitoring:~# /etc/init.d/nagios3 reload

    If there is an error in the configuration files, you will receive a warning:

    Reading configuration data...
    Error: Invalid host object directive 'registe'.
    Error: Could not add object property in file '/etc/nagios3/conf.d/templates.cfg' on line 14.
    ***> One or more problems was encountered while processing the config files...

     

    In this case, an incorrect “registe” parameter instead of the correct “register” was specified at line 14 in the /etc/nagios3/conf.d/templates.cfg file. After fixing the error, reload Nagios again. If the configuration is correct, the following message is displayed:

    * Reloading nagios3 monitoring daemon configuration /files nagios3        [ OK ]

    Nagios Web Interface

     

    The main monitoring page is Tactical Overview.

    Click the number of hosts or services in a given state to display a page with hosts or services in that state.

    Click Host Detail to get an overview of the monitored hosts (devices). Colors indicate the states of individual hosts. Click a host name to get detailed information about this host, including an option to suspend monitoring, set a planned outage, and so on. Click the semaphore icon to get an overview of the services at a host.

    In the sample configuration, Nagios monitors one host. The host was last checked on 2nd November 2009 at 6:22 pm. The host has been “UP” for 1 day and 18 hours. The test result is OK, packet loss 0 %, latency 120μs.

    The list of services is similar to the list of hosts. It is available under Service Detail. Services are grouped by hosts on which they run.

    Monitoring your own SNMP services

    The list of plugins (command) that enable Nagios to monitor services is available in the web interface under View Config, or directly in the files in the /etc/nagios-plugins/config/ directory. If a plugin is not available, a new command to monitor the service needs to be created. The definition of a plugin contains a command_name that will be used to refer to that plugin, and a command (command_line) that checks the state of the service.

    Toner level

    As an example, we can monitor the toner level in a printer over SNMP. OID with the toner level is .1.3.6.1.2.1.43.11.1.1.9.1.1. To avoid defining a separate plugin for each printer, the IP address, SNMP community, warning and critical parameters are passed in variables from the configuration of individual services. Create a /etc/nagios3/conf.d/tmplates.cfg file.

    define host {
    use             standard-host
    host_name       helpdesk_printer
    alias           Helpdesk printer
    address         192.168.12.200
    icon_image      base/hp-printer40.png
    }

    define service {
    use                     standard-service
    service_description     Toner level
    host                    helpdesk_printer
    check_command           printer_toner!public!10%!5%
    }

    The plugin for monitoring the toner level is named printer_toner. Standard Nagios plugins include a program for retrieving values over SNMP. When the plugin is activated to check the current state, $HOSTADDRESS$ is replaced with the actual address indicated in the definition of the printer host. Other variables ($ARG#$) are initialized according to the check_command in the definition of the particular service.

    The printer and toner level monitoring is configured in the /etc/nagios3/conf.d/helpdesk_printer.cfg file.

    define host {
    use             standard-host
    host_name       helpdesk_printer
    alias           Helpdesk printer
    address         192.168.12.200
    icon_image      base/hp-printer40.png
    }

    define service {
    use                     standard-service
    service_description     Toner level
    host                    helpdesk_printer
    check_command           printer_toner!public!10%!5%
    }

    The plugin to use is specified in the first part of the check_command parameter in the service definition. The $HOSTADDRESS$ variable will contain 192.168.12.200, as specified by the address parameter in the host definition. The $ARG#$ variables contain the values specified in the check_command parameter. Individual values are separated with the exclamation mark (“!”), therefore, $ARG1$ = public, $ARG2$ = 10%, $ARG3$ = 5%. When checking this particular service, Nagios executes the following command:

    /usr/lib/nagios/plugins/check_snmp -H '192.168.12.200' -C 'public' -o .1.3.6.1.2.1.43.11.1.1.9.1.1 -w '10%': -c '5%': -l 'Toner level' -u '%'

    You can run this command in a terminal to verify correct settings.

    Remember to reload Nagios configuration using the following command: /etc/init.d/nagios3 reload

    Monitoring sensor readings

    Monitoring of sensor readings is similar to monitoring services at a host. In this example, let us add devices that support Nagios monitoring of their sensor readings. Download the plugin (command) for monitoring Poseidon (or HWg-STE / Damocles) devices from the HW group website.

    Follow the instructions to unpack the downloaded files and place the .pl files to the /opt/hwg/ directory, place the directory with images to /usr/share/nagios3/htdocs/images/logos/, and place the hwg.cfg file to /etc/nagios-plugins/config/.

    Now, let us find the IDs of sensors that we want to monitor.

    Poseidon

    For Poseidon, look at the http://poseidon.hwg.cz/values.xml address:

    Create the /etc/nagios3/conf.d/poseidon_demo.cfg file using these values.

    define host {
    use             standard-host
    host_name       poseidon.hwg.cz
    alias           Poseidon demo
    address         poseidon.hwg.cz
    icon_image      hwg/poseidon40.png
    }

    define service {
    use                     standard-service
    service_description     Office temp.
    host                    poseidon.hwg.cz
    check_command           check_hwg_poseidon!public!20408
    }

    define service {
    use                     standard-service
    service_description     Office humidity
    host                    poseidon.hwg.cz
    check_command           check_hwg_poseidon!public!57356
    }

    define service {
    use                     standard-service
    service_description     Prague temp.
    host                    poseidon.hwg.cz
    check_command           check_hwg_poseidon!public!66
    }

    define service {
    use                     standard-service
    service_description     Prague humidity
    host                    poseidon.hwg.cz
    check_command           check_hwg_poseidon!public!78
    }

    Notice that the warning and critical values are not specified in the configuration. They are loaded automatically from the device (host). It is important to set the host (device) address and the sensor IDs in the check_command parameters of the respective services.

    HWg-STE (SNMP Web Thermometer)

    In a similar way, we can find out sensor IDs of a STE device at http://ste.hwg.cz/:

     

    Create the /etc/nagios3/conf.d/ste_demo.cfg file using these values.

     

    define host {
    use             standard-host
    host_name       ste.hwg.cz
    alias           STE demo
    address         ste.hwg.cz
    icon_image      hwg/ste40.png
    }

    define service {
    use                     standard-service
    service_description     Office temp.
    host                    ste.hwg.cz
    check_command           check_hwg_ste!public!215
    }

    define service {
    use                     standard-service
    service_description     Office humidity
    host                    ste.hwg.cz
    check_command           check_hwg_ste!public!216
    }

    Remember to reload Nagios configuration using the following command: /etc/init.d/nagios3 reload

    Keywords

    Nagios, nagios3, nagios host, nagios service, monitoring, nagios driver, nagios plugin
    HWg-STE, SNMP web thermometer, Poseidon, nagios monitoring

     

    Related products and links