From 8f0cffd9ad72785f5d56f9beec9220e90daaf1aa Mon Sep 17 00:00:00 2001 From: Ludovic Cartier Date: Fri, 2 Jan 2026 14:42:21 +0100 Subject: [PATCH] README - add documentation --- README.md | 151 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 150 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 0f272dc..a5a88c3 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,151 @@ -# nrpe +# NRPE Ansible Role +This Ansible role installs and configures NRPE plugins for monitoring various system and service metrics. + +## Features + +- Deploys custom NRPE checks +- Configures sudoers for checks requiring root privileges + +## Supported Services + +- load +- memory +- disk usage +- disk read-only +- network bandwidth +- dns +- docker +- exim mailqueue +- postfix mailqueue +- needrestart +- process age & zombies +- systemd specific services +- systemd failed services +- mysql +- postgresql +- redis +- kubernetes +-- etcd health +-- API server access +-- deployments +-- jobs & cronjobs +-- pki certs +-- pod restarts +-- pv & pvc +-- replicasets +- raid +-- mdadm +-- 3ware + +## Available Checks + +The following checks are deployed to `/usr/lib/nagios/plugins/` (or configured path): + +- `check_3ware` +- `check_cilium_health` +- `check_coredns_health` +- `check_disk_usage` +- `check_dns` +- `check_docker` +- `check_etcd_health` +- `check_eth` +- `check_exim_mailqueue` +- `check_k8s_apiserver_access` +- `check_k8s_deployments` +- `check_k8s_jobs_cronjobs` +- `check_k8s_pki_certs` +- `check_k8s_pod_restarts` +- `check_k8s_pv_pvc` +- `check_k8s_replicasets` +- `check_mdadm` +- `check_memory` +- `check_mysql_longqueries` +- `check_needrestart` +- `check_postfix_mailqueue` +- `check_postgresql` +- `check_proc_age` +- `check_redis_health` +- `check_rofs` +- `check_systemd_failed` +- `check_systemd_service` + +## Role Variables + +| Variable | Default | Related Check | Description | +|----------|---------|---------------|-------------| +| `nrpe_allowed_hosts` | `127.0.0.1,51.158.69.165,49.12.224.53` | NRPE Config | Allowed hosts to connect to NRPE daemon. | +| `nrpe_load_warning` | `{{ ansible_processor_cores }}` | `check_load` | Warning threshold for system load (1min, 5min, 15min). | +| `nrpe_load_critical` | `{{ ansible_processor_cores * 2 }}` | `check_load` | Critical threshold for system load. | +| `nrpe_check_total_procs_warning` | `500` | `check_procs` | Warning threshold for total processes count. | +| `nrpe_check_total_procs_critical` | `800` | `check_procs` | Critical threshold for total processes count. | +| `nrpe_check_zombie_procs_warning` | `5` | `check_procs` | Warning threshold for zombie processes. | +| `nrpe_check_zombie_procs_critical` | `10` | `check_procs` | Critical threshold for zombie processes. | +| `nrpe_disk_usage_warning` | `80` | `check_disk_usage` | Warning threshold for disk usage (%). | +| `nrpe_disk_usage_critical` | `90` | `check_disk_usage` | Critical threshold for disk usage (%). | +| `nrpe_disk_inode_warning` | `80` | `check_disk_usage` | Warning threshold for inode usage (%). | +| `nrpe_disk_inode_critical` | `90` | `check_disk_usage` | Critical threshold for inode usage (%). | +| `nrpe_memory_warning` | `80` | `check_memory` | Warning threshold for memory usage (%). | +| `nrpe_memory_critical` | `90` | `check_memory` | Critical threshold for memory usage (%). | +| `nrpe_swap_warning` | `70` | `check_swap` | Warning threshold for swap usage (%). | +| `nrpe_swap_critical` | `80` | `check_swap` | Critical threshold for swap usage (%). | +| `nrpe_mailq_warning` | `10` | `check_postfix_mailqueue`, `check_exim_mailqueue` | Warning threshold for mail queue size. | +| `nrpe_mailq_critical` | `20` | `check_postfix_mailqueue`, `check_exim_mailqueue` | Critical threshold for mail queue size. | +| `nrpe_smtp_host` | `localhost` | `check_smtp` | Host to check for SMTP service. | +| `nrpe_bandwidth_warning` | `12M` | `check_eth` | Warning threshold for bandwidth usage. | +| `nrpe_bandwidth_critical` | `15M` | `check_eth` | Critical threshold for bandwidth usage. | +| `nrpe_postgresql_host` | `localhost` | `check_postgresql` | PostgreSQL host. | +| `nrpe_postgresql_port` | `5432` | `check_postgresql` | PostgreSQL port. | +| `nrpe_postgresql_user` | `nagios` | `check_postgresql` | PostgreSQL user. | +| `nrpe_postgresql_password` | `changeme_` | `check_postgresql` | PostgreSQL password. | +| `nrpe_postgresql_backend_warning` | `75` | `check_postgresql` | Warning threshold for backend connections (%). | +| `nrpe_postgresql_backend_critical` | `90` | `check_postgresql` | Critical threshold for backend connections (%). | +| `nrpe_mysql_host` | `localhost` | `check_mysql_longqueries` | MySQL host. | +| `nrpe_mysql_user` | `nagios` | `check_mysql_longqueries` | MySQL user. | +| `nrpe_mysql_password` | `changeme_` | `check_mysql_longqueries` | MySQL password. | +| `nrpe_mysql_longqueries_warning` | `600` | `check_mysql_longqueries` | Warning threshold for long running queries (seconds). | +| `nrpe_mysql_longqueries_critical` | `1200` | `check_mysql_longqueries` | Critical threshold for long running queries (seconds). | +| `nrpe_proc_age_warning` | `400` | `check_proc_age` | Warning threshold for process age (seconds). | +| `nrpe_proc_age_critical` | `600` | `check_proc_age` | Critical threshold for process age (seconds). | +| `nrpe_redis_memory_warning` | `80` | `check_redis_health` | Warning threshold for Redis memory usage (%). | +| `nrpe_redis_memory_critical` | `90` | `check_redis_health` | Critical threshold for Redis memory usage (%). | +| `nrpe_redis_connected_clients_warning` | `200` | `check_redis_health` | Warning threshold for connected clients. | +| `nrpe_redis_connected_clients_critical` | `500` | `check_redis_health` | Critical threshold for connected clients. | +| `nrpe_redis_hitrate_warning` | `80` | `check_redis_health` | Warning threshold for cache hit rate (%). | +| `nrpe_redis_hitrate_critical` | `50` | `check_redis_health` | Critical threshold for cache hit rate (%). | +| `nrpe_redis_fragments_warning` | `1.5` | `check_redis_health` | Warning threshold for fragmentation ratio. | +| `nrpe_redis_fragments_critical` | `2.0` | `check_redis_health` | Critical threshold for fragmentation ratio. | +| `nrpe_redis_replication_lag_warning` | `10` | `check_redis_health` | Warning threshold for replication lag (seconds). | +| `nrpe_redis_replication_lag_critical` | `60` | `check_redis_health` | Critical threshold for replication lag (seconds). | + +## Example Playbooks + +### Basic Usage + +```yaml +--- +- hosts: all + roles: + - nrpe +``` + +### Custom Configuration + +```yaml +--- +- hosts: database_servers + roles: + - role: nrpe + vars: + nrpe_allowed_hosts: '127.0.0.1,10.0.0.5' + nrpe_load_warning: 2 + nrpe_load_critical: 4 + nrpe_memory_warning: 75 + nrpe_memory_critical: 85 + nrpe_disk_usage_warning: 70 + nrpe_disk_usage_critical: 85 +``` + +## License + +MIT \ No newline at end of file