Unicode support beyond the Basic Multilingual Plane?

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

Post Reply
DFH
Posts: 919
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Unicode support beyond the Basic Multilingual Plane?

Post by DFH » Mon Dec 10, 2018 8:40 pm

It rather looks as though TextPipe does not support Unicode characters beyond the Basic Multilingual Plane.

cf. Unicode 11 added Plane 16 to the standard. 100000..10FFFF Supplementary Private Use Area-B
See https://www.unicode.org/versions/Unicode11.0.0/

I've just been testing the filter Convert Numeric HTML/XML Entities to text using the trial run area.

Codes beyond the BMP are improperly converted. e.g.

Code: Select all

𑊰
becomes

Code: Select all

which is U+12B0 ETHIOPIC SYLLABLE KWA
The proper conversion should be U+112B0 KHUDAWADI LETTER A

Thus files containing NCRs with more than 4 hex digits would be converted with errors in the output.

When will TextPipe become more fully compliant with the latest Unicode standard?

Best regards,

David

User avatar
DataMystic Support
Site Admin
Posts: 2189
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Unicode support beyond the Basic Multilingual Plane?

Post by DataMystic Support » Mon May 06, 2019 10:39 am

We are currently looking into what is required here.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 919
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Unicode support beyond the Basic Multilingual Plane?

Post by DFH » Mon Mar 02, 2020 10:09 pm

What's New in TextPipe v11 – 12 December, 2019
==============================================
...
  • Upgraded Unicode support to Unicode 12.1.
...

Thanks!

David

DFH
Posts: 919
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Unicode support beyond the Basic Multilingual Plane?

Post by DFH » Mon Mar 02, 2020 10:14 pm

Bug alert!

Convert Numeric HTML/XML Entities to text converted

Code: Select all

𑊰 ꨀ
to

Code: Select all

; ;
NB. I have also just tried using this same example of Entity data in a UTF-8 input file as well as in the Trial Run area.
The output file had simply a semicolon just like the trial run area.

This is now become a very serious software bug!

Regards,

David
Last edited by DFH on Thu Mar 05, 2020 2:28 am, edited 3 times in total.

DFH
Posts: 919
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Unicode support beyond the Basic Multilingual Plane?

Post by DFH » Tue Mar 03, 2020 12:03 am

I have also retested the similar filter called Convert HTML/XML entities to text.

Please refer to https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

It's apparent from the Help page entitled Convert HTML/XML entities to text that this filter only supports HTML 4.0

It would be therefore be a further essential improvement to expand the covered entities to the larger set of character entity references in HTML 5.0 - complete with the alternative names for some of these.

Furthermore, I have just tested all the 252 covered entities with the filter.
In regard to the HTML 5.0 standard, two of these are now improperly converted by TextPipe 11.4

Code: Select all

Entity	Character	TextPipe	Exact?
⟨	⟨	〈	FALSE
⟩	⟩	〉	FALSE

Code: Select all

⟨ should be U+27E8 (moved to current code point in HTML 5.0; previously in HTML 4.0 it was mapped to U+2329 (9000); 
⟩ should be U+27E9 (moved to current code point in HTML 5.0; previously in HTML 4.0 it was mapped to U+232A (9001);
Best regards,

David
Last edited by DFH on Thu Mar 05, 2020 2:24 am, edited 1 time in total.

DFH
Posts: 919
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Unicode support beyond the Basic Multilingual Plane?

Post by DFH » Wed Mar 04, 2020 2:15 am

See also http://www.datamystic.com/forums/viewtopic.php?f=17&t=2505

DFH
Posts: 919
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Unicode support beyond the Basic Multilingual Plane?

Post by DFH » Sat Mar 14, 2020 2:36 am

Hi Simon,

Anything to report on this critical issue and the related one?

Best regards,

David

Post Reply