Kurdish missing from python babel / CLDR


#1

Bonjour,

A volunteer proposed to translate SecureDrop in Kurdish and @bmeson added it to weblate. When I tried to compile the corresponding po

./manage.py translate-messages --compile

it failed because it is not included in python babel which only includes files found in CLDR.

Before digging further to figure out when Kurdish will be available in CLDR, I was wondering if someone already knows about this ?

Cheers


#2

@dachary Do you know if the volunteer definitely wants Northern Kurdish and not Central? Looks like Central is supported and that is what is used in Iraqi Kurdistan.


#3

Note to self: that is Kurdish Sorani code cdk


#4

I don’t but @bmeson is in contact with them.


#5

@erinm @dachary there is Kurdish and “Kurdish Sorani” available in the weblate. I did not know the distinction so I only added “Kurdish”. Happy to add more if anyone can tell me which to add.


#6

It appears Norther Kurdish (which appears as just “Kurdish”), also called Kurmanjî on our weblate will not work. This came to my attention when a translator told me that Kurdish is being displayed in the wrong RTL / LTR configuration.

Steps to reproduce:

  1. Enable Kurdish on the weblate
  2. Attempt to compile these languages per @dachary’s comment above,
  3. Download the latest python babel according to their documentation.
  4. Search for any variation of the ISO standard(s) of Kurdish for the local. Also search for any variation of Kurdish, Norther kurdish, Kurmanji, ku etc.

Output:

(venv) bmeson@neuromancer:~/opc/babel$ pybabel --list-locales | grep --color -Ei "Kurdish|Northern|kur|ku|kr|kur|Kurmanji|Kurmanjî"
ar_KW           Arabic (Kuwait)
ckb             Central Kurdish
ckb_IQ          Central Kurdish (Iraq)
ckb_IR          Central Kurdish (Iran)
en_MP           English (Northern Mariana Islands)
ki              Kikuyu
ki_KE           Kikuyu (Kenya)
ko_KR           Korean (South Korea)
lrc             Northern Luri
lrc_IQ          Northern Luri (Iraq)
lrc_IR          Northern Luri (Iran)
ru_UA           Russian (Ukraine)
se              Northern Sami
se_FI           Northern Sami (Finland)
se_NO           Northern Sami (Norway)
se_SE           Northern Sami (Sweden)
uk              Ukrainian
uk_UA           Ukrainian (Ukraine)

So the question is why can we even enable Kurdish in the first place? Why does it display?


#7

Because weblate is not exclusively designed for python or any i18n plateform. It is entirely possible that a library in a given programming language that we don’t know about supports Kurdish (ku) just fine because it does not use CLDR (or added to it without contributing back ?).


#8

As discussed in instant messaging, as long as there is one person motivated to translate in Kurdish Kurmanjî, I’m motivated to fix the problem. It may take a while but it will be fixed. All I need is someone stepping in and saying: “Yes, I’ll translate, please fix this problem” :wink:


#9

Yes, I’ll translate, please fix this problem :wink:

I can do (with help from friends) Kurmanji (endonym is Kurmancî), aka Northern Kurdish. And yeah, it’s written LTR. Please let me know if you have any other questions about the language.


#10

Ok, the game is on :slight_smile: I’ll update this thread with progress.


#11

In the CLDR coverage matrix [Kurdish Kurmanji]((http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html#ku) is displayed to be at the seed status which seems better than undetermined.

In the latest release, a few locales moved from status seed to status common which is described as Locales with insufficient coverage were moved into the “seed” directory, and are not part of the release. The data is available via SVN. There are now 16 such languages and associated regions. When coverage for a seed language improves sufficiently, it will be moved into the release.

Checking out the the latest CLDR release with svn co http://www.unicode.org/repos/cldr/tags/release-32/ and looking for Wolof to figure out how it went from seed to common. Comparing common/main/ckb.xml (1317) with seed/main/ku.xml (242 lines) suggests there is still much work to do.

The next step should be to figure out:

  • the contribution process of CLDR
  • getting in touch with the people working on Kurdish Kurmanji

#12

@erinm IIRC you take a particular interest in languages that tend to be neglected for one reason or another. Do you happen to know how one can follow the progress made by CLDR (for instance for Kurdish Kurmanji ) ?


#13

https://kurditgroup.org/unicode/proposal

To: bardaqani@kurditgroup.org
Subject: Kurdish Kurmanji

Hi,

I’m interested to understand how to help make progress on CLDR for Kurdish Kurmanji which is still in “seed” stage

http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html

Seeing that you’re the contact for https://kurditgroup.org/unicode/proposal and even though this happened years ago, I’m hoping you could give me a few pointers to get started ? Your help would be greatly appreciated :slight_smile:

Cheers


#14

@dachary We support increasing access and inclusion of minority language communities in the Internet freedom space, however CLDR progress is outside of my wheelhouse. I’ll take a look and see if I can dig anything up.

Regarding Kurdish Kurmanji translations, @brandones do you have any organizations in mind that would be interested in using SecureDrop? If there is a lot of linguistic variation in Kurmanji and no hard standard, it may be a good idea to identify potential organizational users and tailor to their needs.


#15

Starting january 2018 there will be a weekly meeting of the technical committee. And two meetings

UTC # 154 -- January 22-25
Hosted by Google, Mt. View, CA

Unicode Board of Directors Meeting -- Jan. 26,  1-5pm
Hosted by Google, Mt. View, CA

Using the contact page I subscribed to the Public CLDR Users mailing list. This is low traffic and nothing about Kurdish in the past months.

I sent the following call for help.

Subject: Kurdish Kurmanji progress

Hi,

I'm interested in following the progress of the work done on Kurdish Kurmanji[1] to be notified when it transitions from "seed" to "common". How can I do that ?

Thanks in advance for any pointers you can provide :-)

[1] http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html

#16

Shervin Afshar replied with a pointer to the file that should be monitored for progress of ku.xml.

If someone is willing to help, the new developers page explains how to get involved as well as for data collectors. @brandones do you feel like participating in the making of Kurdish Kurmanji in CLDR ?


#17

Now that we have a way to monitor progress in CLDR, I’ll set a reminder every other month to check about it. It is likely to take some time, years maybe, unless someone actively look for people to get involved. Somone asked in python-babel about two years ago and nothing much happened since. Maybe @erinm has ideas on how to get people involved now that we found the right path to do it.

In the meantime I think SecureDrop should provide a hack to be removed later to support Kurdish Kurmanji. It should be enough to make changes in:

  • securedrop/i18n.py for locale selection
  • securedrop/template_filters.py to add a special case for units and dates conversion

I do not like hacks, reason why it was important to figure out exactly how and when a proper solution can be implemented. @redshiftzero what do you think ?


#18

Sure, participating in making the Kurmanji CLDR seems like a fine thing. I’ll start with that “New Developers” page.


#19

Hi @erinm, there are some Kurdish news agencies based in Turkey (some of whom I’m in touch with) that would probably be interested, but they also all know Turkish, so it’s not actually a pressing issue. So it’s probably not worth hacking SecureDrop to provide support for Kurmanji in the short term. Plus, now we know and can address the root cause, which will have a much wider impact anyway.

Thank you all so much for looking into this!


#20

@brandones thanks for feedback :slight_smile: I propose we remove Kurdish Kurmanji for now (it can easily be added later) so translators are not confused into thinking it can be used. And we keep Kurdish Sorani which can be added to SecureDrop as soon as it is fully translated. What do you think ?