# NRPE Ansible Role This Ansible role installs and configures NRPE plugins for monitoring various system and service metrics. ## Features - Deploys custom NRPE checks - Configures sudoers for checks requiring root privileges ## Supported Services - load - memory - disk usage - disk read-only - network bandwidth - dns - docker - exim mailqueue - postfix mailqueue - needrestart - process age & zombies - systemd specific services - systemd failed services - mysql - postgresql - redis - kubernetes - etcd health - API server access - deployments - jobs & cronjobs - pki certs - pod restarts - pv & pvc - replicasets - pbs (proxmox backup server) - raid - mdadm - 3ware ## Available Checks The following checks are deployed to `/usr/lib/nagios/plugins/` (or configured path): - `check_3ware` - `check_apt_update` - `check_ceph` - `check_cilium_health` - `check_coredns_health` - `check_disk_usage` - `check_dns` - `check_docker` - `check_etcd_health` - `check_eth` - `check_exim_mailqueue` - `check_k8s_apiserver_access` - `check_k8s_deployments` - `check_k8s_jobs_cronjobs` - `check_k8s_pki_certs` - `check_k8s_pod_restarts` - `check_k8s_pv_pvc` - `check_k8s_replicasets` - `check_mdadm` - `check_memory` - `check_mysql_longqueries` - `check_needrestart` - `check_nvme_smart` - `check_nvme_temperature` - `check_pbs_backup` - `check_postfix_mailqueue` - `check_postgresql` - `check_proc_age` - `check_pve_quorum` - `check_pvesr` - `check_reboot_required` - `check_redis_health` - `check_rofs` - `check_ssl_cert` - `check_systemd_failed` - `check_systemd_service` - `check_uptime` - `check_zpool_health` ## Role Variables | Variable | Default | Related Check | Description | |----------|---------|---------------|-------------| | `nrpe_allowed_hosts` | `127.0.0.1,51.158.69.165,49.12.224.53` | NRPE Config | Allowed hosts to connect to NRPE daemon. | | `nrpe_load_warning` | `{{ ansible_processor_cores }}` | `check_load` | Warning threshold for system load (1min, 5min, 15min). | | `nrpe_load_critical` | `{{ ansible_processor_cores * 2 }}` | `check_load` | Critical threshold for system load. | | `nrpe_check_total_procs_warning` | `500` | `check_procs` | Warning threshold for total processes count. | | `nrpe_check_total_procs_critical` | `800` | `check_procs` | Critical threshold for total processes count. | | `nrpe_check_zombie_procs_warning` | `5` | `check_procs` | Warning threshold for zombie processes. | | `nrpe_check_zombie_procs_critical` | `10` | `check_procs` | Critical threshold for zombie processes. | | `nrpe_disk_usage_warning` | `80` | `check_disk_usage` | Warning threshold for disk usage (%). | | `nrpe_disk_usage_critical` | `90` | `check_disk_usage` | Critical threshold for disk usage (%). | | `nrpe_disk_inode_warning` | `80` | `check_disk_usage` | Warning threshold for inode usage (%). | | `nrpe_disk_inode_critical` | `90` | `check_disk_usage` | Critical threshold for inode usage (%). | | `nrpe_memory_warning` | `80` | `check_memory` | Warning threshold for memory usage (%). | | `nrpe_memory_critical` | `90` | `check_memory` | Critical threshold for memory usage (%). | | `nrpe_swap_warning` | `70` | `check_swap` | Warning threshold for swap usage (%). | | `nrpe_swap_critical` | `80` | `check_swap` | Critical threshold for swap usage (%). | | `nrpe_mailq_warning` | `10` | `check_postfix_mailqueue`, `check_exim_mailqueue` | Warning threshold for mail queue size. | | `nrpe_mailq_critical` | `20` | `check_postfix_mailqueue`, `check_exim_mailqueue` | Critical threshold for mail queue size. | | `nrpe_smtp_host` | `localhost` | `check_smtp` | Host to check for SMTP service. | | `nrpe_bandwidth_warning` | `12M` | `check_eth` | Warning threshold for bandwidth usage. | | `nrpe_bandwidth_critical` | `15M` | `check_eth` | Critical threshold for bandwidth usage. | | `nrpe_postgresql_host` | `localhost` | `check_postgresql` | PostgreSQL host. | | `nrpe_postgresql_port` | `5432` | `check_postgresql` | PostgreSQL port. | | `nrpe_postgresql_user` | `nagios` | `check_postgresql` | PostgreSQL user. | | `nrpe_postgresql_password` | `changeme_` | `check_postgresql` | PostgreSQL password. | | `nrpe_postgresql_backend_warning` | `75` | `check_postgresql` | Warning threshold for backend connections (%). | | `nrpe_postgresql_backend_critical` | `90` | `check_postgresql` | Critical threshold for backend connections (%). | | `nrpe_mysql_host` | `localhost` | `check_mysql_longqueries` | MySQL host. | | `nrpe_mysql_user` | `nagios` | `check_mysql_longqueries` | MySQL user. | | `nrpe_mysql_password` | `changeme_` | `check_mysql_longqueries` | MySQL password. | | `nrpe_mysql_longqueries_warning` | `600` | `check_mysql_longqueries` | Warning threshold for long running queries (seconds). | | `nrpe_mysql_longqueries_critical` | `1200` | `check_mysql_longqueries` | Critical threshold for long running queries (seconds). | | `nrpe_proc_age_warning` | `400` | `check_proc_age` | Warning threshold for process age (seconds). | | `nrpe_proc_age_critical` | `600` | `check_proc_age` | Critical threshold for process age (seconds). | | `nrpe_redis_memory_warning` | `-` | `check_redis_health` | Warning threshold for Redis memory usage (%). | | `nrpe_redis_memory_critical` | `-` | `check_redis_health` | Critical threshold for Redis memory usage (%). | | `nrpe_redis_connected_clients_warning` | `-` | `check_redis_health` | Warning threshold for connected clients. | | `nrpe_redis_connected_clients_critical` | `-` | `check_redis_health` | Critical threshold for connected clients. | | `nrpe_redis_hitrate_warning` | `-` | `check_redis_health` | Warning threshold for cache hit rate (%). | | `nrpe_redis_hitrate_critical` | `-` | `check_redis_health` | Critical threshold for cache hit rate (%). | | `nrpe_redis_fragments_warning` | `-` | `check_redis_health` | Warning threshold for fragmentation ratio. | | `nrpe_redis_fragments_critical` | `-` | `check_redis_health` | Critical threshold for fragmentation ratio. | | `nrpe_redis_replication_lag_warning` | `-` | `check_redis_health` | Warning threshold for replication lag (seconds). | | `nrpe_redis_replication_lag_critical` | `-` | `check_redis_health` | Critical threshold for replication lag (seconds). | | `nrpe_uptime_warning` | `1440` | `check_uptime` | Warning threshold for uptime (minutes). | | `nrpe_uptime_critical` | `30` | `check_uptime` | Critical threshold for uptime (minutes). | | `nrpe_ssl_path` | `/etc/haproxy/ssl` | `check_ssl_cert` | Path to SSL certificates directory. | | `nrpe_ssl_warning` | `21` | `check_ssl_cert` | Warning threshold for SSL certificate expiry (days). | | `nrpe_ssl_critical` | `14` | `check_ssl_cert` | Critical threshold for SSL certificate expiry (days). | | `nrpe_ntp_host` | `europe.pool.ntp.org` | `check_ntp` | NTP host to check. | | `nrpe_ntp_warning` | `10` | `check_ntp` | Warning threshold for NTP offset. | | `nrpe_ntp_critical` | `15` | `check_ntp` | Critical threshold for NTP offset. | | `nrpe_pbs_host` | `-` | `check_pbs_backup` | Hôte PBS (IP ou FQDN). | | `nrpe_pbs_token` | `-` | `check_pbs_backup` | API token PBS au format `user@realm!tokenid:secret`. | | `nrpe_pbs_store` | `-` | `check_pbs_backup` | Nom du datastore PBS. | | `nrpe_pbs_backups` | `[inventory_hostname]` | `check_pbs_backup` | Liste des backup-id à vérifier. | | `nrpe_pbs_type` | `host` | `check_pbs_backup` | Type de backup : `host`, `vm` ou `ct` (optionnel). | | `nrpe_pbs_port` | `8007` | `check_pbs_backup` | Port de l'API PBS (optionnel). | | `nrpe_pbs_namespace` | `-` | `check_pbs_backup` | Namespace PBS (optionnel). | | `nrpe_pbs_ssl_insecure` | `false` | `check_pbs_backup` | Ignorer les erreurs de certificat SSL (optionnel). | ## Example Playbooks ### Basic Usage ```yaml --- - hosts: all roles: - nrpe ``` ### Custom Configuration ```yaml --- - hosts: database_servers roles: - role: nrpe vars: nrpe_allowed_hosts: '127.0.0.1,10.0.0.5' nrpe_load_warning: 2 nrpe_load_critical: 4 nrpe_memory_warning: 75 nrpe_memory_critical: 85 nrpe_disk_usage_warning: 70 nrpe_disk_usage_critical: 85 ``` ### PBS Backups ```yaml --- - hosts: myserver roles: - role: nrpe vars: nrpe_pbs_host: pbs01.example.com nrpe_pbs_token: "backup@pbs!monitoring:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" nrpe_pbs_datastore: main nrpe_pbs_ssl_insecure: true nrpe_pbs_backup_id: 100 # force l'ID utilisé côté PBS # nrpe_pbs_type: host # optionnel, défaut : host (vm, ct) # nrpe_pbs_namespace: ns # optionnel ``` > L'API token doit avoir le privilege `DatastoreAudit` sur le datastore concerné. ## License MIT