Kurdish missing from python babel / CLDR

Hey @brandones and @dachary. Sorry for the delay in response. I have reached out to several contacts and have been recommended two individuals in Kurdish media. Once I get more information from them I will share and maybe we will be able to pull them in to support the CLDR progress.

1 Like

Hi @erinm !

Did you hear back from your contacts ? I realize this is a difficult topic because the skills required to make progress are extremely rare. I wonder how we can ensure progress. I remember that localizationlab has a focus on languages that are spoken by a relatively small number of people and maybe you have suggestions ?

I searched Kurdish Kurmanji Translations and found Mariana Mellor who also has a a linkedin page and presumably a facebook page. I thought maybe to send her a message explaining the problem we have like so:

Dear Mariana Mellor,

We are a non profit organization helping journalist communicate anonymously with their sources[1]. We are looking for volunteer translators to Kurmanji Kurmancî and wonder if you would be interested in helping ?

If you’re too busy at the moment (we all have day jobs ;-), would you have recommendations about people fluent in Kurmanji Kurmancî and who may have spare time ? Ideally such a person would also have technical background to improve the international standards which are currently making very slow progress and have rudimental support for Kurmanji Kurmancî [2] compared to Kurmanji Sorani[3] .

Sincerely

[1] https://securedrop.org
[2] Unicode CLDR - CLDR Change Requests
[3] Unicode CLDR - CLDR Change Requests

1 Like

While at FOSDEM someone suggested contacting universities because they are likely to be the best place to find people with the right mixture of multidisciplinary skills required to make progress with CLDR. Finding a university professor is not that easy but … not too far from my home is the Paris Kurde Institute.

I will call them on Monday to discuss the idea and hopefully get a few useful contacts.

A mail was sent via the contact form:

Bonjour,

Freedom of the Press Foundation est une association à but non lucratif qui défend la liberté de la presse et fournit des outils destinés aux journalistes. Nous rencontrons des difficultés dans la traduction de certains logiciels en Kurdish Kurmanji et cherchons à entrer en contact avec des universitaires qui pourraient nous aiguiller.

Seriez-vous disponibles pour en discuter ?

Cordialement

Although they did not reply to my mail, someone from the Paris Kurde Institute kindly suggested over the phone that I talk to Mme Beau about this topic. She will be available this afternoon 106, rue La Fayette, F-75010, Paris +33 (0)1 48 24 64 64 and I’ll try again.

As it turns out the Paris Kurde Institute employs two computer scientists. And they are interested by this topic. We have an appointment tomorrow afternoon with Mr Beau and the best part is … their offices are 15 minutes walk from my home. What are the odds ? :stuck_out_tongue:

Bonjour,

To summarize and quoting a relevant comment regarding the current contribution process:

The determination of when a locale moves from “seed” to “common” is done at the end of a release when we evaluate the amount of data received in any given seed locale vs. the Miminal Data Commitment as outlined in ​http://cldr.unicode.org/index/cldr-spec/minimaldata#TOC-Minimal-Data-Commitment

1). Make sure you and/or your vetter have a survey tool account[1], and are prepared to submit data via the survey tool during the data submission phase scheduled to open in early May.
See ​http://cldr.unicode.org/index/survey-tool/accounts

2). Use the survey tool to contribute data. If you are the only ones providing data for Northern Kurdish and have a “guest” account, sometimes we will allow your vetter to have regular vetter instead of “guest” status, which makes it easier to get confirmed data. We deal with these on a case by case basis during the submission process.

3). At the end of the release, if the locale has sufficient data, then it will be moved from seed to main per CLDR TC decision.

There is a yearly release schedule for improving the database, it starts in may, roughly http://cldr.unicode.org/index/survey-tool/guide.

[1] http://cldr.unicode.org/index/survey-tool/accounts suggests to go to Contact Form to request an account and select Request for CLDR Submitter ID.

Voila

P.S. the current state of the data collected for Kurdish Sorani and Kurdish Kurmanji

Bonjour,

We had a very educational discussions about Kurdish languages and codes today at the Paris Kurde Institute. I’m a little embarassed because it turns out to be clearly articulated in the Kurdish languages wikipedia page. But Ridvan (computer scientist) and Gerard (librarian) were kind enough to explain that what I thought to be our target, ku is actually a macrolanguage that includes three individual codes:

But there is a lot more to it and Gerard encyclopedic knowledge of all variants in existence and their respective specificities made me think we were embarked in a never ending journey. They re-assured me: with the above three individual codes nobody will be left behind.

It turns out participating in CLDR is likely to also interest working groups focused on linguistics that meet on a regular basis at the institute and publish a newsletter regarding terminology normalization in various languages. Ridvan agreed to be the liaison on this CLDR effort, with support from the institute.

It looks like this projects is finally taking shape and deserves its own category to discuss about the work that is ahead of us.

Cheers

Hi everyone,

I am Gerard from the Kurdish Institute, finally, Loic suggested I should register onto the
forum and post this data.

If you are going to add codes, I think it might be worth adding as well the ZZA (Zazaki)
macrolanguage, which is also present in ISO-639-3 and Unimarc (libraries standard), besides
CKB and KMR. Although not formally classified under Kurdish , it corresponds to the language
spoken in Dersim (Tunceli) Kurdish area of Turkish Kurdistan. Formally, ZZA gets divided into
two languages, DIQ (Dimîlî = Northern Zazakî), and KIU (Kurmanckî = Southern Zazakî), but I
am not sure this differenciation is necessary.

In my opinion, ZZA might have more currency than SDH (I am not sure what publications are
being done in Kurdish Southern dialects, but I know for sure there are journals in Zazakî).

Hope it helps,

Gerard

4 Likes

The Survey Tool is now open for general submission. Please start by reading the Information Hub for Linguists (Unicode CLDR - Information Hub for Linguists). Whenever you log-in, use this Information hub as your starting point; the information on this page will continuously be updated during the submission period, with new or changed material marked in color. Thank you for your participation in CLDR; your contributions are acknowledged and appreciated by CLDR users.

Now is the time to contribute :tada: