CI and cloud API

Bonjour,

We’re in good shape to have CI for securedrop-club and it could be as simple as:

molecule test -s bind
molecule test -s icinga
molecule test -s backup
molecule test -s postfix
molecule test -s weblate
...

Over the past few weeks we managed to make it stable. However there is something we have no control over: the stability of the OpenStack API. Just like AWS, it will fail tests and create frustration. Today is one of these days: the API fails more than 25% of the time and nothing really works. I propose we deal with this by:

  • cleary identifying (for human and machine benefit) when the cloud API fails
  • retrying N times if the cloud API fails
  • pause the CI, retry the last failed job every H hours, resume the CI unless it fails because of the cloud API

What do you think ?

How does the api fail?

Could we rely on public status timeline?

long story short: yesterday the OpenStack API failed with (503 error) > 50% of the calls. And no trouble was reported on the status panel of the provider. I filed an issue and this morning they apologized for fixing the problem without announcing it. This is not too good. But at least they are not denying the problem happened, this is good.

ok; since we get a 5xx error to the api call it easy to incriminate the provider so we have have to define a test result which means “try later”.

1 Like