Disaster recovery

dachary · April 1, 2018, 11:38am

Bonjour,

We should probably organizer a disaster recovery exercise to verify it works as expected. It should not be more complex than

creating a new virtual machine from a backup
changing the DNS
verifying all works

But it’s worth checking Any volunteer?

Cheers

conorsch · April 16, 2018, 6:27pm

Would be interested in knowing more about recovery procedures here, @dachary, so happy to schedule something post-launch of the new securedrop.org. Might have some bearing on the backup/restore flow for FPF sites, as well.

We currently use duply to encrypt backups to S3 buckets. The restore flow is quite similar to what you describe above.

dachary · April 23, 2018, 11:39am

@fpoulain would you be available some time during the may 14th week for a disaster recover exercize?

dachary · April 23, 2018, 12:03pm

Bonjour,

The disaster recovery exercise will be Monday May 14th, 2018 @ 11am Paris/Berlin time at https://gitter.im/freedomofpress/securedrop @fpoulain and myself will be there and anyone is welcome to join.

We will:

Kill the packages host (because well, it’s not very precious or used at the moment by renaming the VM and suspend it
Manually create a VM by the same name using the backup from the previous day
Run ansible to sync the backup (necessary because the IP changes)
Wait a few hours to verify domain name propagation and icinga are fine with the backup restauration
Update the documentation

We can also destroy a machine such as the website and verify it comes back to life after a run of ansible. It is comparatively less scary because it does not contain information and can be re-generated from the ansible repository.

We should also think about what happens if the ansible host itself breaks, although I think it does not matter and we have a backup anyways.

Anything else you would like covered?

fpoulain · April 23, 2018, 12:20pm

The disaster recovery exercise will be Monday May 14th, 2018 @ 11am Paris/Berlin time at https://gitter.im/freedomofpress/securedrop

Noted in my agenda!

We should also think about what happens if the ansible host itself breaks, although I think it does not matter and we have a backup anyways.

At least there is a local commit differentiating prod from preprod. Maybe the details (without passwords) of this commit should simply be set in the docs?

Anything else you would like covered?

It seems quite comprehensive for me.

dachary · May 14, 2018, 10:06am

The exercise went fine and the documentation was updated accordingly. I feel better knowing the backup strategy actually works