I recently encountered a bug in Normalization to NFC for text containing Myanmar characters.
The bug affected composite characters each of which uses the same pair of combining characters:
့ MYANMAR SIGN DOT BELOW
် MYANMAR SIGN ASAT
I suspect that TextPipe uses out of date Normalization algorithms.
Some background.
Software that includes Normalization should be tested against the official Unicode Normalization Test http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt (2.2MB) for that version of Unicode,
The process of converting a string to NFC or NFD requires a stage called "canonical ordering", whereby characters are reordered in ascending order according to their canonical combining class [ccc]. See http://www.unicode.org/reports/tr15/?win#Description_Norm.
U+103A MYANMAR SIGN ASAT has ccc=9, whereas U+1037 MYANMAR SIGN DOT BELOW has ccc=7; therefore U+1037 is reordered before U+103A.
The bug is that TextPipe does not reorder these two codepoints.
David
