Architecture RFC: SecureDrop Server Operating System Transition

eloquence · May 10, 2018, 10:16pm

In the course of SecureDrop development, we routinely encounter questions with deep architectural implications. To avoid dealing with these questions in a purely reactive manner (e.g., in response to a new tech announcement, or a security vulnerability), the dev team recently agreed that it would make sense to have regular, open architecture meetings, at a frequency about once every 1-2 months.

Each meeting will tackle a challenging topic. Prior to the meeting, we’ll start a forum thread like this one, to enable asynchronous debate. At the meeting itself, participants are invited to give brief, 5-10 minute presentations. We will then discussion the possible decisions. We strive for broad consensus, but @redshiftzero will act as a tie-breaker if needed.

The first meeting will take place on ~~Tuesday, May 22, 2018~~ Thursday, June 14, 2018, at 10:00 AM PDT / 5:00 PM UTC on Jitsi.

The topic for discussion are coming changes to our base OS. Our current base operating system (Ubuntu 14.04) reaches its End of Life in April 2019, so we must, at minimum, perform an upgrade to Ubuntu 16.04 before then (ideally well before then, to allow for a gradual transition before we hit the EoL window). This is a major change that needs to be carefully managed. See issue #3204 for background.

One possible alternative that has been discussed is to use this opportunity to make the switch to a new base operating system that also gives us security and maintainability benefits, such as Fedora Atomic or Ubuntu Core. In particular, there is significant interest among the development team in an immutable base operating system, to increase predictability of SecureDrop installs and upgrades.

Needless to say, such a change would be a major undertaking, given the 60+ SecureDrop installations that would need to eventually make the transition. However, the Xenial upgrade will also be a significant burden for administrators. If we do upgrade to Xenial, a base OS change could be deferred to 2019 or even 2020, depending on the administrative effort involved in the Xenial transition (e.g., reinstall of some or all SecureDrop instances).

There might be a third option: upgrade to Xenial but harden configuration and deployment to ensure a more predictable state of the server OS. What are your thoughts?

I look forward to discussing this here and in the meeting. (My own role in these discussions is primarily facilitative, and as PM I also will help with the implementation planning.)

dachary · May 10, 2018, 10:45pm

Another architectural topic is to establish a better separation between things installed & upgraded via Ansible or via packages (or container images). For instance, although OSSEC rules are installed via packages and do not require manual intervention, some scripts and procmail rules can only be installed manually by running Ansible. We do not have a policy to decide what should be Ansible managed and what should be included in packages (or container images).

Once we have a policy (or a rationale or … we can start moving things from Ansible to packages or vice versa to gradually improve our ability to upgrade existing instances.

eloquence · May 16, 2018, 7:34pm

Just a note that I’ve moved this meeting to June 14, to give more time for initial exploration and testing, especially during the May 30-June 13 development sprint time window. In particular, we’ll want to make progress on the tasks identified in https://github.com/freedomofpress/securedrop/issues/3204 , and potentially also on exploratory testing of alternative base OS choices beyond the Atomic proof-of-concept by @kushaldas . If there are specific testing/research tasks that you think would be useful for this discussion, please flag them here.

redshiftzero · May 16, 2018, 8:05pm

If we do determine that migrations like Trusty -> Xenial cannot be safely done for the SecureDrop servers without reinstalls, I think we would be doing a disservice to future maintainers of this project by installing Xenial after the reinstall. They’d have this same issue in a few years time, perhaps when there are many more SecureDrop instances to maintain, and thus doing a ecosystem-wide reinstall is even more burdensome in terms of travel and support. In my view, a significantly different deployment approach like using libostree would then become much more favorable, even though that shifts maintenance burden from admins onto the SecureDrop development team (noting that the burden of reinstalls would also be borne by the core development team in many cases).

heartsucker · June 2, 2018, 11:57am

I am going to recommend against Ubuntu Core. From the build store docs:

The Ubuntu snap store

The Ubuntu snap store is enabled by default on Ubuntu Core images and classic systems running snapd. It allows developers to release free or paid apps for multiple architectures, on multiple release channels from daily builds to stable releases.

Follow the step-by-step guide to learn how to distribute snaps to your devices through the Ubuntu snap store.

All of the docs I could find (or find easily) on these seem to want to force interactions through Canonical’s infra. I do not like the idea of SD’s updates depending on playing nice with the Canonical Snap store. I’m think this could be a deal breaker for @loic too as I find Ubuntu’s definition of FOSS to be a little too corporate sometimes.

Additionally, it seems snapcraft.io doesn’t want to allow external repos. From the publish docs:

Create a Snap Store account

To release snaps you will need to create an account on the dashboard. Here you can customize how your snaps are presented and review your uploads.

You’ll need to choose a unique “developer namespace” as part of the account creation process. This name will be visible by users and associated with your snaps.

Once you’ve confirmed your account, you’re ready to start pushing your snaps to the Snap Store.

Make sure the snapcraft and snap commands know about you by logging in using the email address attached to your account.

Based off the bits of research I’ve done, I think Fedora Atomic is the most FOSS friendly solution. Yocto looks great too, but we would basically be building our own Linux distro if we used that, and that’s probably not something we want to commit to at the moment.

heartsucker · June 2, 2018, 12:36pm

This is the original libostree ticket: https://github.com/freedomofpress/securedrop/issues/2966

Regardless of how we deploy atomic updates, I believe the configuration mechanism described therein is what we should use to keep our configs updated.

dachary · June 2, 2018, 12:39pm

I agree if it turns out the root cause for the need to re-install is because the Trusty → Xenial migration path is broken. On the other hand if this is a SecureDrop bug, changing the underlying operating system may be overkill.

My 2cts

eloquence · June 14, 2018, 12:42am

As a reminder, this meeting will take place tomorrow at 10:00 AM PDT / 5:00 PM UTC in the usual place.
@conorsch and @mike have done some additional research into the Xenial update and install experience, and you’ll find notes in the comments:

I’ll be preparing an agenda here, and we can also use the pad for live notes:

https://pad.riseup.net/p/ArchitectureMeetingAgenda

edenemmanuel · June 14, 2018, 5:01pm

How difficult would it be to be both on Bionic and Xenial? So we can upgrade instances to Xenial for the time being and put new instances on Bionic. (Ducks in fear of the answers. )

In any case, if we have to do a reinstall and we’re sticking with Ubuntu, I think Bionic would be allow us far more runway (April 2023 EOL).

eloquence · June 14, 2018, 6:44pm

Thank you all for a productive meeting! The notes are at https://github.com/freedomofpress/securedrop/wiki/Architecture-Meetings-2018-06-14 ; help editing them to be more useful is appreciated!

I will attempt a very quick recap:

Everyone is in broad agreement with a long term vision of SecureDrop servers as a low-maintenance appliance. This means:
- We want admins to be able to perform all key configuration tasks through the secure web interface. SSH access may even be disabled entirely.
- We want to systematize the curation of security updates to all aspects of the system, i.e., we want to be aware of all packages that are updated/installed.
- We want to avoid situations where SecureDrop servers are in an inconsistent state with each other.
- We want to ensure that major changes (e.g., changing the webserver from Apache to nginx) can be managed in a predictable manner across the SecureDrop install base.
There are many prerequisites other than the base OS that we have to address to make this a reality (additional administrative UIs, processes/staffing for vetting package updates, etc.).
Atomic looks like a good base OS for this potential appliance, but we’ll consider other options as we get closer.
We can’t let the Trusty end-of-life deadline drive this important development. There are too many prerequisites to consider to arrive at an overall sound architecture, and we don’t have the capacity to do this while also building the SecureDrop Workstation.
As such, we will make the transition from Trusty to Xenial. We will strive to do so in a reasonably tight timeframe, to minimize the overlaps between supporting both systems. We will not be able to support SecureDrop-on-Trusty past its EOL, so this creates a hard limit for the amount of time we need to support both.
Before we start on that transition in earnest, we will finish up improvements to QA (functional testing, automated upgrade testing) that are already in flux. These improvements are key to minimize the QA burden for the Trusty-to-Xenial transition.
We will not lose sight of the long term vision of a SecureDrop Appliance, and work backwards in coming months to identify how we can incrementally get closer to it. However, most of the substantive changes to the ops/deployment story will likely have to wait until we have completed the Xenial transition.

Please let me know if I did not represent the consensus accurately, or missed any important considerations that were raised. I hope that this architecture meeting was useful, and if so, that we can use a version of this process for some of the big decisions to come.

heartsucker · June 14, 2018, 10:02pm

Hey all. Sorry I missed the meeting. My mobile carrier has super spotty internet. Aaaaaanyway.

Xenial being so easy sounds like it would be a better target to hit. When we do go that way, what we should have is the master config file on the server and periodic state checks from it. Once we have this in place, it might be possible for us to do “full state pushes” by sending very carefully crafted ansible playbooks as sd-config packages. As a related pre-req, we should deprecate config.py and get that to being a flat config file. If we get these two things done, nastier changes in the future can be done a bit easier without admin intervention.