Disaster recovery


#1

Bonjour,

We should probably organizer a disaster recovery exercise to verify it works as expected. It should not be more complex than

  • creating a new virtual machine from a backup
  • changing the DNS
  • verifying all works

But it’s worth checking :wink: Any volunteer?

Cheers


#2

Would be interested in knowing more about recovery procedures here, @dachary, so happy to schedule something post-launch of the new securedrop.org. Might have some bearing on the backup/restore flow for FPF sites, as well.

We currently use duply to encrypt backups to S3 buckets. The restore flow is quite similar to what you describe above.


#3

@fpoulain would you be available some time during the may 14th week for a disaster recover exercize?


#4

Bonjour,

The disaster recovery exercise will be Monday May 14th, 2018 @ 11am Paris/Berlin time at https://gitter.im/freedomofpress/securedrop @fpoulain and myself will be there and anyone is welcome to join.

We will:

  • Kill the packages host (because well, it’s not very precious or used at the moment :wink: by renaming the VM and suspend it
  • Manually create a VM by the same name using the backup from the previous day
  • Run ansible to sync the backup (necessary because the IP changes)
  • Wait a few hours to verify domain name propagation and icinga are fine with the backup restauration
  • Update the documentation

We can also destroy a machine such as the website and verify it comes back to life after a run of ansible. It is comparatively less scary because it does not contain information and can be re-generated from the ansible repository.

We should also think about what happens if the ansible host itself breaks, although I think it does not matter and we have a backup anyways.

Anything else you would like covered?


#5

The disaster recovery exercise will be Monday May 14th, 2018 @ 11am Paris/Berlin time at https://gitter.im/freedomofpress/securedrop

Noted in my agenda!

We should also think about what happens if the ansible host itself breaks, although I think it does not matter and we have a backup anyways.

At least there is a local commit differentiating prod from preprod. Maybe the details (without passwords) of this commit should simply be set in the docs?

Anything else you would like covered?

It seems quite comprehensive for me.


Weblate code walk
Weblate infrastructure maintenance
#6

The exercise went fine and the documentation was updated accordingly. I feel better knowing the backup strategy actually works :slight_smile: