Weblate Maintenance - week of Feb 4 2019


#1

Hi folks,

The Weblate service is currently undergoing maintenance and most functionality isn’t working right now. We expect to resolve those issues by the end of the week. Apologies for the inconvenience, and thanks for your patience and your work on SecureDrop translations! Stay tuned for further updates and notifications.

Keturah


#2

Hi @ketudb,
Is there an issue somewhere to understand the technical details of this maintenance? I’m familiar with weblate and available to help. I never faced a weblate issue that could not be fixed within the hour and this weeks long downtime worries me.
Cheers


Weblate.securedrop.org timesout on Save
#3

Hey @ketudb & @kushaldas , I’m wondering how this will affect the current release schedule if at all. Are there any new strings to add to SecureDrop this cycle? Can we assume that there will be a freeze as scheduled by EOD next Tuesday?


#4

Ah, it seems I missed to update the deadlines, as we pushed the release one more week. Will do that now.


#5

Hi! We’re using the opportunity of fixing the issue to upgrade weblate to the latest version, as well as migrate it to the new server. We do the upgrade by bumping the versions progressively, which requires us to run tests after each version bump. We’ll be announcing the actual downtime when we’re ready to cut over for the migration - it won’t be down all week, but we’ll be doing the migration sometime this week.

Hope that helps!

Keturah


#6

I’m very curious about the issue you’re experiencing (or you had in the recent past). I never had any issue with weblate upgrades: the releases were carefully prepared by the author and the upgrade path worked well. Is there an issue somewhere where I can read about the technical details?


#7

The issue we’re resolving is incidental to the upgrade process. We’re just careful with the upgrade process to ensure thorough testing.


#8

Is there a specific reason why you would not want to share the technical details of the issue you are facing? Is it an issue because you found a bug in weblate? Is it because the underlying OS is having an issue? Is it because the host on which the virtual machine is not behaving as it should? Is it because the database is out of sync? This is the kind of details I’m interested in.


#9

Hey Loic! @ketudb is FPF’s new (and awesome!) site reliability engineer, working mostly (or exclusively?) on SecureDrop things. I’m pretty sure they’ve got it under control and are only not responding with more details here, cuz there’s lots of other pressing things happening at the moment. I’m sure they’ll be happy to respond with a full scoop on what various issues were encountered, once it’s all live on the new server. We all wanna contribute to the broader effort of helping more peeps get more stable with Weblate, too. :slight_smile:


#10

(…and I’m only responding on @ketudb’s behalf, because I’m up and they’ve gone to bed. Like most sane people in our timezone should have, by now—which I say as a knock against myself, as clearly I’m not the sane person in this convo thread!)


#11

@ninavizz I’m volunteering my time to help, in case people are too busy to fix the issue.


#12

Hey @dachary !! Since weblate is still a relatively new system to FPF staff (in terms of maintenance) - I’d like for the team to be familiar with the technical maintenance particulars of it. I’m 1000% sure you could SSH and fix everything flat but in that scenario our team doesn’t gain knowledge of how to fix it in the future in case you aren’t around to offer assistance. Believe me though I really really do appreciate the offer of assistance here and your long list of contributions to the project. Without you we wouldn’t have this entire translation system running! :heart: Do you want a more active role in debugging the weblate instance moving forward? There’s an infra meeting today and I can certainly bring it up.

Technically - the issue was that whenever someone posted a translation suggestion, an exception was raised on the backend and the entire site was crashing. I didn’t do a lot of research to find out the WHY because I noticed we were behind on versions. I wanted to get us on the newest revision before I started opening upstream bug reports. The version bump ended up fixing the issue on a sandbox server we had running last week (or maybe the week before). I think the complex part here is that I did the initial upgrade testing on the sandbox server and now @ketudb is taking over on the live migration based on my documentation. So since we are both new to weblate there is some apprehension about moving too quickly and breaking something we don’t full understand all the moving pieces of. Are we being overly-cautious ? Probably. Does this help illuminate the issue some? Definitely not trying to hide anything here - just growing pains of us wrangling a new system on our end.


#13

@mike thanks for the background explanation. How about this: we can schedule a debugging session at a time of your choosing. We could share a tmux session and I would not type anything, just share comments and knowledge. Like I said, I’ve only had good experience with weblate and facing the actual problem would most likely lead to interesting tips on your end and a quick resolution for the translators waiting on weblate to be back. And save you time :slight_smile:

To be prepared, would you be so kind as to share with me the technical details about the stack trace you got, the version that was behind etc.? If they are on the private ticket system to which I do not have access, copy/paste somewhere, even if not in a very orderly manner, will probably give me a hint and save everyone valuable time.


#14

UX person waving furiously, here… I’m sure there are other arease of the SD product we could use the help with, @dachary! How might be best to reach-out to loop you into those things—assuming you’d be keen to lend your dev chops, there? :smiley:


#15

:slight_smile: I’d be happy to.


#16

Hey @dachary! Thanks for the offer - I think we’ve got it in hand though. We know an upgrade fixes the issue, and keeping up to date on the latest patches and upgrades is an important part of maintaining the system anyway, so we know it’s worth the effort to approach the problem within the scope of the upgrade procedure. As @mike said, our processes may be overly cautious at times, but they’re in-line with most industry approaches towards maintaining systems with regards to change control, testing, and review processes.

Thanks!


#17

Yay @dachary ! We all coordinate in Gitter, and I’m sure once 2 additional FPF new hire devs have been fully onboarded and gotten their heads around their own various domains of leadership, lots more opportunities will arise, there. :slight_smile:


#18

The upgrade & migrations are now completed, and this should resolve the ongoing issues with weblate at this stage. The warning banner about data loss will be remove in the morning, once the service has been reviewed/re-tested and the issues are confirmed to be resolved! :slight_smile: Thanks for your patience everyone!