Files
nrpe/README.md
2026-01-02 14:42:21 +01:00

6.3 KiB

NRPE Ansible Role

This Ansible role installs and configures NRPE plugins for monitoring various system and service metrics.

Features

  • Deploys custom NRPE checks
  • Configures sudoers for checks requiring root privileges

Supported Services

  • load
  • memory
  • disk usage
  • disk read-only
  • network bandwidth
  • dns
  • docker
  • exim mailqueue
  • postfix mailqueue
  • needrestart
  • process age & zombies
  • systemd specific services
  • systemd failed services
  • mysql
  • postgresql
  • redis
  • kubernetes -- etcd health -- API server access -- deployments -- jobs & cronjobs -- pki certs -- pod restarts -- pv & pvc -- replicasets
  • raid -- mdadm -- 3ware

Available Checks

The following checks are deployed to /usr/lib/nagios/plugins/ (or configured path):

  • check_3ware
  • check_cilium_health
  • check_coredns_health
  • check_disk_usage
  • check_dns
  • check_docker
  • check_etcd_health
  • check_eth
  • check_exim_mailqueue
  • check_k8s_apiserver_access
  • check_k8s_deployments
  • check_k8s_jobs_cronjobs
  • check_k8s_pki_certs
  • check_k8s_pod_restarts
  • check_k8s_pv_pvc
  • check_k8s_replicasets
  • check_mdadm
  • check_memory
  • check_mysql_longqueries
  • check_needrestart
  • check_postfix_mailqueue
  • check_postgresql
  • check_proc_age
  • check_redis_health
  • check_rofs
  • check_systemd_failed
  • check_systemd_service

Role Variables

Variable Default Related Check Description
nrpe_allowed_hosts 127.0.0.1,51.158.69.165,49.12.224.53 NRPE Config Allowed hosts to connect to NRPE daemon.
nrpe_load_warning {{ ansible_processor_cores }} check_load Warning threshold for system load (1min, 5min, 15min).
nrpe_load_critical {{ ansible_processor_cores * 2 }} check_load Critical threshold for system load.
nrpe_check_total_procs_warning 500 check_procs Warning threshold for total processes count.
nrpe_check_total_procs_critical 800 check_procs Critical threshold for total processes count.
nrpe_check_zombie_procs_warning 5 check_procs Warning threshold for zombie processes.
nrpe_check_zombie_procs_critical 10 check_procs Critical threshold for zombie processes.
nrpe_disk_usage_warning 80 check_disk_usage Warning threshold for disk usage (%).
nrpe_disk_usage_critical 90 check_disk_usage Critical threshold for disk usage (%).
nrpe_disk_inode_warning 80 check_disk_usage Warning threshold for inode usage (%).
nrpe_disk_inode_critical 90 check_disk_usage Critical threshold for inode usage (%).
nrpe_memory_warning 80 check_memory Warning threshold for memory usage (%).
nrpe_memory_critical 90 check_memory Critical threshold for memory usage (%).
nrpe_swap_warning 70 check_swap Warning threshold for swap usage (%).
nrpe_swap_critical 80 check_swap Critical threshold for swap usage (%).
nrpe_mailq_warning 10 check_postfix_mailqueue, check_exim_mailqueue Warning threshold for mail queue size.
nrpe_mailq_critical 20 check_postfix_mailqueue, check_exim_mailqueue Critical threshold for mail queue size.
nrpe_smtp_host localhost check_smtp Host to check for SMTP service.
nrpe_bandwidth_warning 12M check_eth Warning threshold for bandwidth usage.
nrpe_bandwidth_critical 15M check_eth Critical threshold for bandwidth usage.
nrpe_postgresql_host localhost check_postgresql PostgreSQL host.
nrpe_postgresql_port 5432 check_postgresql PostgreSQL port.
nrpe_postgresql_user nagios check_postgresql PostgreSQL user.
nrpe_postgresql_password changeme_ check_postgresql PostgreSQL password.
nrpe_postgresql_backend_warning 75 check_postgresql Warning threshold for backend connections (%).
nrpe_postgresql_backend_critical 90 check_postgresql Critical threshold for backend connections (%).
nrpe_mysql_host localhost check_mysql_longqueries MySQL host.
nrpe_mysql_user nagios check_mysql_longqueries MySQL user.
nrpe_mysql_password changeme_ check_mysql_longqueries MySQL password.
nrpe_mysql_longqueries_warning 600 check_mysql_longqueries Warning threshold for long running queries (seconds).
nrpe_mysql_longqueries_critical 1200 check_mysql_longqueries Critical threshold for long running queries (seconds).
nrpe_proc_age_warning 400 check_proc_age Warning threshold for process age (seconds).
nrpe_proc_age_critical 600 check_proc_age Critical threshold for process age (seconds).
nrpe_redis_memory_warning 80 check_redis_health Warning threshold for Redis memory usage (%).
nrpe_redis_memory_critical 90 check_redis_health Critical threshold for Redis memory usage (%).
nrpe_redis_connected_clients_warning 200 check_redis_health Warning threshold for connected clients.
nrpe_redis_connected_clients_critical 500 check_redis_health Critical threshold for connected clients.
nrpe_redis_hitrate_warning 80 check_redis_health Warning threshold for cache hit rate (%).
nrpe_redis_hitrate_critical 50 check_redis_health Critical threshold for cache hit rate (%).
nrpe_redis_fragments_warning 1.5 check_redis_health Warning threshold for fragmentation ratio.
nrpe_redis_fragments_critical 2.0 check_redis_health Critical threshold for fragmentation ratio.
nrpe_redis_replication_lag_warning 10 check_redis_health Warning threshold for replication lag (seconds).
nrpe_redis_replication_lag_critical 60 check_redis_health Critical threshold for replication lag (seconds).

Example Playbooks

Basic Usage

---
- hosts: all
  roles:
    - nrpe

Custom Configuration

---
- hosts: database_servers
  roles:
    - role: nrpe
      vars:
        nrpe_allowed_hosts: '127.0.0.1,10.0.0.5'
        nrpe_load_warning: 2
        nrpe_load_critical: 4
        nrpe_memory_warning: 75
        nrpe_memory_critical: 85
        nrpe_disk_usage_warning: 70
        nrpe_disk_usage_critical: 85

License

MIT