Pattern matching UTF-8 support

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

DFH
Posts: 644
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Pattern matching UTF-8 support

Postby DFH » Fri Oct 14, 2011 10:54 pm

The help page includes the following:
2. In a pattern, the escape sequence \x{...}, where the contents of the braces is a string of hexadecimal digits, is interpreted as a UTF-8 character whose code number is the given hexadecimal number, for example: \x{1234}. If a non-hexadecimal digit appears between the braces, the item is not recognized. This escape sequence can be used either as a literal, or within a character class.

Yet if in a replace filter, in the "Find pattern (perl style" field, and with UTF-8 enabled, I enter \x{1234}, TextPipe indicates that "the character value in the x{...} sequence is too large".

This makes it impossible to use as documented.

I have many UTF-8 files containing the single character U+FEFF ZERO WIDTH NO BREAK SPACE.
I would like to remove this character. How can I do it? TextPipe does not allow \x{feff}.

For U+FEFF, the equivalent hexadecimal byte codes are \xEF\xBB\xBF yet TextPipe doesn't find this pattern either!

User avatar
DataMystic Support
Site Admin
Posts: 2154
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Pattern matching UTF-8 support

Postby DataMystic Support » Tue Oct 18, 2011 8:49 am

Hi David,

I checked it out with the PCRE guys. You need to enable the UTF-8 flag for this to work.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 644
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Pattern matching UTF-8 support

Postby DFH » Fri Oct 28, 2011 4:20 am

I had enabled UTF-8 support. That's the point - I did all the right things but it still won't work.

David

User avatar
DataMystic Support
Site Admin
Posts: 2154
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Pattern matching UTF-8 support

Postby DataMystic Support » Fri Oct 28, 2011 10:20 am

Hi David,

Just checked and you are right. This is a display validation issue only - the filter works properly, but the error message it gives is not correct.

It will be fixed in the next release.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 644
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Pattern matching UTF-8 support

Postby DFH » Sat Oct 29, 2011 1:01 am

Thanks for the response, Simon.

I was almost starting to imagine I was going mad, but you have allayed my fears!

David


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: No registered users and 1 guest