Monitoring weblate & forum

dachary · September 17, 2017, 10:10pm

Bonjour,

After some research & discussions with various people who know more than I do about monitoring services, the following seems to be a reasonable direction for monitoring the weblate & forum web services.

icinga deployed via docker. The official ansible module is not fully automated.
using ansible to deploy the vm and install docker to run icinga from docker
using ansible to collect host information and create the relevant files in /etc/icinga2 so the desired services are monitored. For this part we could use icinga2-ansible-add-hosts although it does not do much and most of the files would have to be manually crafted.

The alternatives prometheus seems more metric oriented and sensu is really complicated to install (maybe by design to sell the proprietary installer ?).

Cheers

dachary · September 26, 2017, 12:35pm

For the record I’m organizing Ansible playbooks at http://lab.securedrop.club/main/securedrop-club . It is quite messy at the moment but making good progress.

fpoulain · October 6, 2017, 7:53pm

I seen the docker image provides Icinga Director.

I never tried it, but it is announced for “Users with the desire to completely automate their datacenter”. At first look, I thought that unfortunately the doc didn’t explained in what it solves automation problems. Anyway, at a second look I seen a cli tool which seems interesting : https://www.icinga.com/docs/director/latest/doc/60-CLI/ . Moreover, it seems we can define deployment tasks, conditioned to some kind of « current state ».

It’s very far to be all clear for me, but looks like promising.

dachary · October 6, 2017, 11:21pm

This is a very interesting lead ! For some reason I discarded director but I don’t remember why… I’ll take another look. Thanks

fpoulain · October 7, 2017, 2:43pm

For the record, icinga provides a non-interactive handshake procedure between master and satellite : https://www.icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/#distributed-monitoring-automation-cli-node-setup

fpoulain · October 7, 2017, 4:55pm

Disclaimer: I played with few nagios, icinga and icinga2 servers on small clusters. But i am very new to automation, so there is some part of speculation in what follows.

I guess that provided the docker image, we will easily deploy a basic icinga on each VM. The most tricky part seems possible via the non-interactive way aforementioned (to be tested) or via the interactive way (well known) using the expect ansible module.

Here follows what I would like to do for the configuration part of an automated monitoring setup.

Today, let’s speak about the icinga configuration philosophy. In what follows, « service » means « check » (it’s nagios/icinga nomenclature).

Most deployments I seen are clearly inspired from a « nagios » pattern. It is what is called the « Object logic style » here. By doing this, you roughly define one service for each check you will enable. Adding a new check means copy/pasting a service, which is not terribly aesthetic (think about how does looks like the conf for one hundred of http vhosts). Moreover it is not very clever: think about monitoring a http host. Maybe you would like to monitor (at least) an answer on :80, another on :443, and also the validity of the certificate. Doing this « à la nagios » means that you will copy/paste 3 times the same-but-not-exactly service (again, think about how does looks like the conf for one hundred vhosts).

Automation will help us to manage this kind of stuff. But I would not be so proud to do so.

Icinga2 offer a much more flexible and clever way to instantiate our checks, in a way much more descriptive; thus more concise, more aesthetic, more clever and terribly lovely. This is the « apply logic style ». You can describe every kind of behavior, the way you want, using a quite complete language (including list and associative array), as an host attribute.

So directly at host level you can add any attribute you like, e.g. a list of hardware block devices, a list of mounted volumes, a list of vhosts and some associated useful attributes, a list of process you would like to check and their associated limits, a list of git repos to be checked, etc.

Those attributes provided, you can define any generic service you like which will take into account these attributes, the best way possible. Since it it generic code, you don’t have any need to template this part of code. To give an idea of the simplicity of the icinga2 language, here follows how you generically check all the certificates of all the vhosts whoch are declared using TLS:

apply Service "Check TLS certificate " for (http_vhost => config in host.vars.http_vhosts) {
    import "generic-service"

    check_command = "http"
    vars.http_address = config.http_vhost
    command_endpoint = " ... "
    vars.http_certificate = 21
    vars.http_sni = true

    vars += config
    assign where config.http_ssl == true
}

with an host which simply contains this kind of declaration:

  vars.http_vhosts["Forum"] = {
    http_vhost = "forum.securedrop.org"
    http_ssl = true
  }

Well, fortunately the description may add a list of some attended strings at some given uri, and any other parameter of your choice, provided that you write the code which will exploit it.

I made this kind of configuration for my last (manual) deployment of monitoring stuff. And it really a pleasure to write it and to maintain it. With a very concise configuration at host level, and a little bit of easy generic code, we were able to check for vhosts, dns zones consistency, dns views consistency, attended processes, attended vhosts, attended output IPs, git repos, mails queues, services banners (ssh, smtp, etc.), upgrades, running kernels, mailname consistency, volumes, databases, etc. See e.g. here for an overview of the readability of a fully monitored host object : https://admin.chapril.org/doku.php?id=admin:procedures:ajout-d-une-machine#ajout_de_l_objet_a_la_configuration

Next time, I will discuss the way I would like to write it in ansible paybooks.

dachary · October 7, 2017, 8:06pm

Thanks for sharing this, very interesting I don’t have any icinga expertise but from what I remember from nagios it really makes a difference.

I guess the next thing I should work on is figure out how to configure an icinga satellite on the target host ? If I remember correctly what you taught me, this involves getting an auth token from the icinga master and copying it to the icinga satellite so it is allowed to connect to the master. Right ?

fpoulain · October 8, 2017, 9:15am

Yeah. Icinga 2 is very flexible and doesn’t impose monitoring architecture. So we have to define it. The simplest is to follow the master with clients setup. It implies:

Getting a master to work => that’s on the way,
Getting a client for each VM on which you would execute checks.

The handshake between the master and the client is mainly related to acknowledge themselves their existency, their roles and all TLS stuffs. (IIRC with NRPE you only define an allowed IP for command submission.)

Also, Icinga use basically the same software for clients and master. This has the benefits that configuration objects may be pushed from master to clients, so you can absolutely forget the ugly « NRPE commands » files. By the way, executables checks are still managed locally, as well as required sudo perms.

Last but not least, Icinga2 define « zones », which is a way to control information sharing over the monitoring infrastructure. Usually we define (for a master/clients setup):

a global zone for shared configuration among all the cluster;
a master zone for the master;
a client zone for each client which get the master zone as parent zone.

This way, each client doesn’t know about others (the master will distribute only needed stuffs).

So, we will have to define a Ansible role for icinga client which:

install the software(s) on the client VM;
define a basic host with parameters on the master VM;
handshake themselves;
define a new client zone on the master VM;
reload configuration on each master and client;

Actually, I seen a few playbooks on ansible-galaxy/github which do the stuff.

fpoulain · October 12, 2017, 2:00pm

I gave some hints here, since the large subject go beyond weblate and forum monitoring.