默认情况下,nagios都是轮询主动的去检测客户端监控项。下面来说说nagios被动检测,也就是nagios客户端主动的将检查到的结果直接提交给nagios服务端。
对某些环境下,被动检测比主动检测好。例如,数据备份是否成功的监控。在我之前的工作中,数据备份后将备份结果写入到文件,nagios客户端检测该文件的信息来确定成功与否,这就存在一个问题,就是在备份周期内,nagios检测到备份不成功,不停地的发送告警通知,不胜其烦。对于这种情况,可以使用nagios被动检测 + 新鲜度来实现。文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
下面来看看被动监控的配置:文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
1. 开启被动监控
# vim /usr/local/nagios/etc/nagios.cfg # accept_passive_service_checks = 1
2. 定义被动监控指令
# vim /usr/local/nagios/etc/objects/commands.cfg define command { command_name check_dummy command_line $USER1$/check_dummy $ARG1$ $ARG2$ }
3. 定义要被动监控的主机
# vim /usr/local/nagios/etc/ttlsa.com.cfg define service { use generic-service host_name www.ttlsa.com service_description BACKUP active_checks_enabled 0 passive_checks_enabled 1 check_freshness 1 freshness_threshold 86400 check_command check_dummy!1!"Backup failed!No backups have run for 24 hours"" }
check_dummy指令实际上不检查任何东西,指定两个参数,一个是状态,一个是输出,始终返回这两个参数。文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
# /usr/local/nagios/libexec/check_dummy 0 successful OK: successful # /usr/local/nagios/libexec/check_dummy 1 failed WARNING: failed # /usr/local/nagios/libexec/check_dummy 2 failed CRITICAL: failed # /usr/local/nagios/libexec/check_dummy 3 failed UNKNOWN: failed
如果在freshness_threshold时间内,被动检测还没提交数据,check_command将运行,即使主动检测被禁用。文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
4. 外部应用程序如何提交主机检查结果
外部应用程序可以通过编写一个PROCESS_HOST_CHECK_RESULT外部命令外部命令文件提交主机检查结果给Nagios 。命令的格式如下:文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
[<timestamp>] PROCESS_HOST_CHECK_RESULT;<host_name>;<host_status>;<plugin_output>文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
timestamp:unix时间戳文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
host_name:监控的主机地址文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
host_status:主机的状态( 0 = OK,1 = WARNING,2 =CRITICAL,3 = UNKNOWN)文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
plugin_output:主机检查的文本输出文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
如:文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
# CHECK="[`date +%s`] PROCESS_SERVICE_CHECK_RESULT;10.0.0.166;BACKUP;0;Nightly backups were successful" # echo $CHECK >>/usr/local/nagios/var/rw/nagios.cmd
5. 被动检测客户端
如果是在同一台nagios服务器上,可以直接通过上面的外部指令提交被动检测结果。那如果是在远程主机上呢,应用程序没法做到,为了让远程主机能够发送被动检查结果给nagios,可以使用NSCA插件。该插件包含了对nagios主机和从远程主机上执行的客户端运行的守护进程。该守护程序将监听来自远程客户端的连接,在提交的结果进行一些基本的验证,然后直接写检查结果到外部命令。文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
需要在nagios服务端和客户端都安装NSCA文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
服务端:文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
# wget http://downloads.sourceforge.net/project/nagios/nsca-2.x/nsca-2.7.2/nsca-2.7.2.tar.gz # tar -xzf nsca-2.7.2.tar.gz # cd nsca-2.7.2 # ./configure # make # cp src/nsca /usr/local/nagios/bin # cp sample-config/nsca.cfg /usr/local/nagios/etc # vi /usr/local/nagios/etc/nsca.cfg password=www.ttlsa.com # /usr/local/nagios/bin/nsca -c /usr/local/nagios/etc/nsca.cfg --single
客户端:文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
# wget http://downloads.sourceforge.net/project/nagios/nsca-2.x/nsca-2.7.2/nsca-2.7.2.tar.gz # tar -xzf nsca-2.7.2.tar.gz # cd nsca-2.7.2 # ./configure # make # mkdir -p /usr/local/bin /usr/local/etc # cp src/send_nsca /usr/local/bin # cp sample-config/send_nsca.cfg /usr/local/etc # vi /usr/local/etc/send_nsca.cfg password=www.ttlsa.com # CHECK="10.0.0.166\tBACKUP\t0\tBackup was successful, this check submitted by NSCA\n" # echo -en $CHECK | send_nsca -c /usr/local/etc/send_nsca.cfg -H 10.0.100.125 1 data packet(s) sent to host successfully.
10.0.0.166 被监控服务器;10.0.100.125监控服务器。文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
安装在监控服务器上的NSCA守护进程监听客户端send_nsca提交的服务信息,验证密码是否正确的,数据格式是否符合标准。数据格式如下:文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
<host_name>\t<service_description>\t<check_result>\t<check_output>\n文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
check_result:检测状态 0 = OK,1 = WARNING,2 =CRITICAL,3 = UNKNOWN文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
一旦收到提交的数据,就会被翻译并写入到外部命令文件/usr/local/nagios/var/rw/nagios.cmd,并作为一个本地提交的被动检查。文章源自运维生存时间-https://www.ttlsa.com/nagios/nagios-passive-detection/
评论