Character cAsE filters and the Turkish alphabet

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

Post Reply
DFH
Posts: 944
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Character cAsE filters and the Turkish alphabet

Post by DFH » Tue Mar 03, 2020 10:12 pm

The help pages for the various Character cAsE filters states the following:
This filter expects UTF-8 data and will handle foreign character sets.
This is not quite true, in that there are exceptions in some bicameral alphabets such as Turkish and Northern Azeri.
Both these alphabets include the following two letters:

Code: Select all

U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE : i dot
U+0131 LATIN SMALL LETTER DOTLESS I
So for example pasting the following into the Trial Run area:

Code: Select all

İı
running the tOGGLE cASE filter makes no change.
On the other hand, it does change most accented Latin letters, e.g.

Code: Select all

Š
to

Code: Select all

š
Perhaps the sentence in the Help pages should be qualified.
This filter expects UTF-8 data and will handle some foreign character sets.
Not sure how you might implement the proper case rules for the Turkish alphabet, etc.
These filters would first need to have the writing system context specified by the user.

Furthermore, I would guess that you'd not given any consideration to extending these Character cAsE filters to cover the Cherokee supplement block of small letters that were defined by Unicode 8.0 (June 2015).


Best regards,
David

Post Reply