Kurdish missing from python babel / CLDR

dachary · December 12, 2017, 10:25pm

Ok, the game is on I’ll update this thread with progress.

dachary · December 13, 2017, 2:12pm

In the CLDR coverage matrix [Kurdish Kurmanji]((http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html#ku) is displayed to be at the seed status which seems better than undetermined.

In the latest release, a few locales moved from status seed to status common which is described as Locales with insufficient coverage were moved into the “seed” directory, and are not part of the release. The data is available via SVN. There are now 16 such languages and associated regions. When coverage for a seed language improves sufficiently, it will be moved into the release.

Checking out the the latest CLDR release with svn co http://www.unicode.org/repos/cldr/tags/release-32/ and looking for Wolof to figure out how it went from seed to common. Comparing common/main/ckb.xml (1317) with seed/main/ku.xml (242 lines) suggests there is still much work to do.

The next step should be to figure out:

the contribution process of CLDR
getting in touch with the people working on Kurdish Kurmanji

dachary · December 17, 2017, 1:57pm

@erinm IIRC you take a particular interest in languages that tend to be neglected for one reason or another. Do you happen to know how one can follow the progress made by CLDR (for instance for Kurdish Kurmanji ) ?

Kurdish Sorani proposal dated 2014 from the Unicode® Technical Committee Document Registry
A proposal to correct/add/remove Kurdish Characters from https://kurditgroup.org/unicode/proposal

dachary · December 17, 2017, 4:56pm

https://kurditgroup.org/unicode/proposal

To: bardaqani@kurditgroup.org
Subject: Kurdish Kurmanji

Hi,

I’m interested to understand how to help make progress on CLDR for Kurdish Kurmanji which is still in “seed” stage

http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html

Seeing that you’re the contact for https://kurditgroup.org/unicode/proposal and even though this happened years ago, I’m hoping you could give me a few pointers to get started ? Your help would be greatly appreciated

Cheers

erinm · December 18, 2017, 7:47am

@dachary We support increasing access and inclusion of minority language communities in the Internet freedom space, however CLDR progress is outside of my wheelhouse. I’ll take a look and see if I can dig anything up.

Regarding Kurdish Kurmanji translations, @brandones do you have any organizations in mind that would be interested in using SecureDrop? If there is a lot of linguistic variation in Kurmanji and no hard standard, it may be a good idea to identify potential organizational users and tailor to their needs.

dachary · December 21, 2017, 11:30pm

Starting january 2018 there will be a weekly meeting of the technical committee. And two meetings

UTC # 154 -- January 22-25
Hosted by Google, Mt. View, CA

Unicode Board of Directors Meeting -- Jan. 26,  1-5pm
Hosted by Google, Mt. View, CA

Using the contact page I subscribed to the Public CLDR Users mailing list. This is low traffic and nothing about Kurdish in the past months.

I sent the following call for help.

Subject: Kurdish Kurmanji progress

Hi,

I'm interested in following the progress of the work done on Kurdish Kurmanji[1] to be notified when it transitions from "seed" to "common". How can I do that ?

Thanks in advance for any pointers you can provide :-)

[1] http://www.unicode.org/cldr/charts/32/supplemental/locale_coverage.html

dachary · December 29, 2017, 8:43am

Shervin Afshar replied with a pointer to the file that should be monitored for progress of ku.xml.

If someone is willing to help, the new developers page explains how to get involved as well as for data collectors. @brandones do you feel like participating in the making of Kurdish Kurmanji in CLDR ?

dachary · December 29, 2017, 8:57am

Now that we have a way to monitor progress in CLDR, I’ll set a reminder every other month to check about it. It is likely to take some time, years maybe, unless someone actively look for people to get involved. Somone asked in python-babel about two years ago and nothing much happened since. Maybe @erinm has ideas on how to get people involved now that we found the right path to do it.

In the meantime I think SecureDrop should provide a hack to be removed later to support Kurdish Kurmanji. It should be enough to make changes in:

securedrop/i18n.py for locale selection
securedrop/template_filters.py to add a special case for units and dates conversion

I do not like hacks, reason why it was important to figure out exactly how and when a proper solution can be implemented. @redshiftzero what do you think ?

brandones · December 31, 2017, 10:47pm

Sure, participating in making the Kurmanji CLDR seems like a fine thing. I’ll start with that “New Developers” page.

brandones · December 31, 2017, 10:58pm

Hi @erinm, there are some Kurdish news agencies based in Turkey (some of whom I’m in touch with) that would probably be interested, but they also all know Turkish, so it’s not actually a pressing issue. So it’s probably not worth hacking SecureDrop to provide support for Kurmanji in the short term. Plus, now we know and can address the root cause, which will have a much wider impact anyway.

Thank you all so much for looking into this!

dachary · January 1, 2018, 10:01am

@brandones thanks for feedback I propose we remove Kurdish Kurmanji for now (it can easily be added later) so translators are not confused into thinking it can be used. And we keep Kurdish Sorani which can be added to SecureDrop as soon as it is fully translated. What do you think ?

erinm · January 8, 2018, 5:42pm

Hey @brandones and @dachary. Sorry for the delay in response. I have reached out to several contacts and have been recommended two individuals in Kurdish media. Once I get more information from them I will share and maybe we will be able to pull them in to support the CLDR progress.

dachary · January 26, 2018, 10:34am

Hi @erinm !

Did you hear back from your contacts ? I realize this is a difficult topic because the skills required to make progress are extremely rare. I wonder how we can ensure progress. I remember that localizationlab has a focus on languages that are spoken by a relatively small number of people and maybe you have suggestions ?

I searched Kurdish Kurmanji Translations and found Mariana Mellor who also has a a linkedin page and presumably a facebook page. I thought maybe to send her a message explaining the problem we have like so:

Dear Mariana Mellor,

We are a non profit organization helping journalist communicate anonymously with their sources[1]. We are looking for volunteer translators to Kurmanji Kurmancî and wonder if you would be interested in helping ?

If you’re too busy at the moment (we all have day jobs ;-), would you have recommendations about people fluent in Kurmanji Kurmancî and who may have spare time ? Ideally such a person would also have technical background to improve the international standards which are currently making very slow progress and have rudimental support for Kurmanji Kurmancî [2] compared to Kurmanji Sorani[3] .

Sincerely

[1] https://securedrop.org
[2] Unicode CLDR - CLDR Change Requests
[3] Unicode CLDR - CLDR Change Requests

dachary · February 17, 2018, 8:54am

While at FOSDEM someone suggested contacting universities because they are likely to be the best place to find people with the right mixture of multidisciplinary skills required to make progress with CLDR. Finding a university professor is not that easy but … not too far from my home is the Paris Kurde Institute.

I will call them on Monday to discuss the idea and hopefully get a few useful contacts.

A mail was sent via the contact form:

Bonjour,

Freedom of the Press Foundation est une association à but non lucratif qui défend la liberté de la presse et fournit des outils destinés aux journalistes. Nous rencontrons des difficultés dans la traduction de certains logiciels en Kurdish Kurmanji et cherchons à entrer en contact avec des universitaires qui pourraient nous aiguiller.

Seriez-vous disponibles pour en discuter ?

Cordialement

dachary · February 20, 2018, 8:21am

Although they did not reply to my mail, someone from the Paris Kurde Institute kindly suggested over the phone that I talk to Mme Beau about this topic. She will be available this afternoon 106, rue La Fayette, F-75010, Paris +33 (0)1 48 24 64 64 and I’ll try again.

dachary · February 20, 2018, 2:32pm

As it turns out the Paris Kurde Institute employs two computer scientists. And they are interested by this topic. We have an appointment tomorrow afternoon with Mr Beau and the best part is … their offices are 15 minutes walk from my home. What are the odds ?

dachary · February 21, 2018, 1:40pm

Bonjour,

To summarize and quoting a relevant comment regarding the current contribution process:

The determination of when a locale moves from “seed” to “common” is done at the end of a release when we evaluate the amount of data received in any given seed locale vs. the Miminal Data Commitment as outlined in http://cldr.unicode.org/index/cldr-spec/minimaldata#TOC-Minimal-Data-Commitment

1). Make sure you and/or your vetter have a survey tool account[1], and are prepared to submit data via the survey tool during the data submission phase scheduled to open in early May.
See http://cldr.unicode.org/index/survey-tool/accounts

2). Use the survey tool to contribute data. If you are the only ones providing data for Northern Kurdish and have a “guest” account, sometimes we will allow your vetter to have regular vetter instead of “guest” status, which makes it easier to get confirmed data. We deal with these on a case by case basis during the submission process.

3). At the end of the release, if the locale has sufficient data, then it will be moved from seed to main per CLDR TC decision.

There is a yearly release schedule for improving the database, it starts in may, roughly http://cldr.unicode.org/index/survey-tool/guide.

[1] http://cldr.unicode.org/index/survey-tool/accounts suggests to go to Contact Form to request an account and select Request for CLDR Submitter ID.

Voila

P.S. the current state of the data collected for Kurdish Sorani and Kurdish Kurmanji

dachary · February 21, 2018, 3:39pm

Bonjour,

We had a very educational discussions about Kurdish languages and codes today at the Paris Kurde Institute. I’m a little embarassed because it turns out to be clearly articulated in the Kurdish languages wikipedia page. But Ridvan (computer scientist) and Gerard (librarian) were kind enough to explain that what I thought to be our target, ku is actually a macrolanguage that includes three individual codes:

ckb Kurdish Sorani which is already in CLDR main
kmr Kurdish Kurmanji is missing
sdh Southern Kurdish is missing

But there is a lot more to it and Gerard encyclopedic knowledge of all variants in existence and their respective specificities made me think we were embarked in a never ending journey. They re-assured me: with the above three individual codes nobody will be left behind.

It turns out participating in CLDR is likely to also interest working groups focused on linguistics that meet on a regular basis at the institute and publish a newsletter regarding terminology normalization in various languages. Ridvan agreed to be the liaison on this CLDR effort, with support from the institute.

It looks like this projects is finally taking shape and deserves its own category to discuss about the work that is ahead of us.

Cheers

Paris_Kurd_library · February 22, 2018, 4:21pm

Hi everyone,

I am Gerard from the Kurdish Institute, finally, Loic suggested I should register onto the
forum and post this data.

If you are going to add codes, I think it might be worth adding as well the ZZA (Zazaki)
macrolanguage, which is also present in ISO-639-3 and Unimarc (libraries standard), besides
CKB and KMR. Although not formally classified under Kurdish , it corresponds to the language
spoken in Dersim (Tunceli) Kurdish area of Turkish Kurdistan. Formally, ZZA gets divided into
two languages, DIQ (Dimîlî = Northern Zazakî), and KIU (Kurmanckî = Southern Zazakî), but I
am not sure this differenciation is necessary.

In my opinion, ZZA might have more currency than SDH (I am not sure what publications are
being done in Kurdish Southern dialects, but I know for sure there are journals in Zazakî).

Hope it helps,

Gerard

dachary · May 26, 2018, 6:09am

The Survey Tool is now open for general submission. Please start by reading the Information Hub for Linguists (Unicode CLDR - Information Hub for Linguists). Whenever you log-in, use this Information hub as your starting point; the information on this page will continuously be updated during the submission period, with new or changed material marked in color. Thank you for your participation in CLDR; your contributions are acknowledged and appreciated by CLDR users.

Now is the time to contribute