Icinga monitoring architecture

As explained in this entry I follow an « apply logic style » for icinga 2 configuration, with is pretty well suited for generic monitoring setup on homogenous clusters. Fortunately it does not disallow adding isolated services (à la Nagios), so future exceptions to the rules may be handled this way.

I discuss here about configuration already available in the playbook (modulo unchecked bugs) and I how to follow our goals, in the bests possible ways.

For most of the service definitions, they are based on predefined commands which are documented here.

Base system monitoring

For each host we:

  • check ping (default host check in icinga)
  • check ssh
  • check apt
  • check icinga
  • check load
  • check procs
  • check swap when vars.swap is defined
  • check users
  • check run_kernel (check if it run the most up-to-date kernel)
  • check fail2ban process
  • check sshd process
  • check rsyslogd process
  • check icinga2 process
  • check cron process

Git repos monitoring

A host can declare a git repo to be checked (designed originally for etckeeper):

  vars.repos["Bling"] = {
    dir = "/var/git/bling"
  }

Disk and partitions monitoring

A host can declare any partition to be checked:

  vars.disks["disk"] = {
  }
  vars.disks["disk /"] = {
    disk_partitions = "/"
  }
  vars.disks["disk /var"] = {
    disk_partitions = "/var"
  }
  vars.disks["disk /tmp"] = {
    disk_partitions = "/tmp"
  }

Processes monitoring

A host can declare any process presence to be checked:

  vars.process["Incron"] = {
    procs_command = "incrond"
    procs_critical = "1:1"
  }

Mail sending monitoring

A host can declare any non-null value in vars.sendmail. Then mailname, mail queue and process are checked, but it’s not sufficient.

I wrote some time ago a qshape based test which is suitable to detect delivery problems for mass mailling (much better than the mail queue which can legitimately grows when needed). But it is not adapted for sparse emailling.

So, some nice projects would be of interest to monitor our ability to send emails:

  • check rbls.
  • a mail loop test (verify the self-delivery of a sent mail gone via another relay)
  • a mail delivery test (verify the delivery of a sent mail in some of the majors mails domains)

Web services monitoring

A host can declare hosting web at a given fqdn:

  vars.http_vhosts["Secure Drop Forum"] = {
    http_vhost = "forum.securedrop.org"
    http_uri = "/c/devops"
    http_ssl = true
    http_string = "devops discussions"
  }
  • Each fqdn is processed via check_http from icinga master and should provide http_string in answer’s body
  • Each fqdn is processed via check_http from icinga master and should not provide some strings in the answer. It is useful to prevent from accidentally deploy spywares. For now, spywares checked are:
  • googleapis.com
  • cloudflare.com
  • google-analytics.com
  • gravatar.com
  • If http_ssl = true the check is processes using https and the TLS certificate is retrieved for validity check.

Moreover if a host declare vars.httpd = "apache" or vars.httpd = "apache2" or vars.httpd = "nginx", then processes check are executed.

If a host declarevars.sqlserver = "mysql" or vars.sqlserver = "mariadb" or vars.sqlserver = "pgsql", then processes check are executed.

It is probably easily feasible to associate a list of scripts to each fqdn for more advanced checks (check result of a POST, etc.) if needed.

DNS service monitoring

A host can declare hosted zones files which can be checked via named-checkzone (syntax consistency) and check_whois (domain expiration):

  /* Define zones and files for checks */
  vars.zones["Secure Drop Club"] = {
    fqdn = "securedrop.club"
    file = "/etc/bind/zones/masters/securedrop.club"
    view = "external"
  }

Maybe we could add a check dig on the A and NS records, and eventually use zonemaster or a webservice providing zonemaster results.

As explained in this entry I follow an « apply logic style » for icinga 2 configuration, with is pretty well suited for generic monitoring setup on homogenous clusters.

As a result:

  • Enabling monitoring in the postfix playbook should be reduced to adding an host variable vars.sendmail = true to the host file configuration and reloading Icinga.
  • Enabling (basic) monitoring in the forum playbook should be reduced to adding a key (providing vhost description) to the host variable http_vhosts to the host file configuration and reloading Icinga.
1 Like

An example of welcome monitoring script is to verify that (e.g. web but not only) installed 3thrd party software is up-to-date.

1 Like

http://lab.securedrop.club/fpoulain/securedrop-club/tags/monitoring_server_and_client_ok looks very good and passes molecule test -s monitoring_client :thumbsup: