Replace list filter inserts spurious character Â

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Replace list filter inserts spurious character Â

Postby DFH » Thu Jul 21, 2011 8:38 pm

I'm using a replace list tab file as follows:

Code: Select all

<<   «
>>   »
The replacement characters are U+00AB and U+00BB respectively.

but TextPipe Standard v8.9.2 inserts a spurious character  (U+00C2) in front of the replacement characters.
btw. The same thing happens when the replace list is a CSV file.

Where is this spurious character coming from?
I think there a very serious bug in the latest version!
i.e. As a consequence of "Updated internal pattern matching libraries."
cf. I just uninstalled v8.9.2 and re-installed v8.8.2 and the problem disappears!

The XHTML file to be processed is encoded UTF-8 (without BOM), as is the replacement list file.

David

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Replace list filter inserts spurious character Â

Postby DataMystic Support » Fri Jul 22, 2011 10:41 am

The correct UTF-8 encoding of U+00AB is \xC2\xAB

The latest release of TP allows the search/replace lists (.tab and .csv) to be Unicode, however most of TP's internals are not Unicode aware. Each line from your search/replace list is converted from a UTF16-LE string to a UTF-8 string before being processed. The same also now applies to rows from the grid.

Previous versions of TP loaded the search/replace lists naively. If your search/replace list was saved as ANSI then you would not get any extra \xC2. If saved as UTF-8 (with or without BOM) the extra \xC2 will be in the file.

Let me know which way you would like to go on this.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Replace list filter inserts spurious character Â

Postby DFH » Fri Jul 22, 2011 6:04 pm

Hi Simon,

I have accumulated a considerable number of replace lists since I began using TextPipe.
Almost all of them are encoded as UTF-8 without BOM.

You're not telling me that I should have to change most of them and check/debug previously working filters, are you?
That would be a huge chore that I have not planned as part of my workload.

Many of them are used as Perl pattern replacements, with UTF-8 support ticked.
The latter is because the Files to Process are mostly UTF-8 themselves.
So if I don't use something with UTF-8 support, the files being processed get corrupted.

The phrase "backwards compatibility" springs to mind.
Surely, you owe it to your customers at least to provide an option that will not cause well-established filters to break?

Moreover, what about the longstanding descriptions in the pattern matching reference?
2. In a pattern, the escape sequence \x{...}, where the contents of the braces is a string of hexadecimal digits, is interpreted as a UTF-8 character whose code number is the given hexadecimal number, for example: \x{1234}. If a non-hexadecimal digit appears between the braces, the item is not recognized. This escape sequence can be used either as a literal, or within a character class.

3. The original hexadecimal escape sequence, \xhh, matches a two-byte UTF-8 character if the value is greater than 127.

According to this, as an alternative, I should have been able to use \x{00AB} and \x{00BB) in the replace list file. Yet in version 8.9.2 this doesn't work either.
Neither did \x{ab} and \x{bb}, so the help reference I've been using to solve issues with naive replacements is also "broken".
That was one of the first things that I tried before reverting to v8.8.2 to find out what on earth was happening.

Best regards,
David


Best regards,
David

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Replace list filter inserts spurious character Â

Postby DataMystic Support » Mon Jul 25, 2011 10:54 pm

Ok! We have changed this back for 8.9.3.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Replace list filter inserts spurious character Â

Postby DFH » Wed Jul 27, 2011 1:08 am

When will 8.9.3 become available?

David

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Replace list filter inserts spurious character Â

Postby DataMystic Support » Wed Jul 27, 2011 1:11 am

In about one hour...
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Replace list filter inserts spurious character Â

Postby DFH » Wed Jul 27, 2011 4:26 am

8.9.3 just installed.

No more spurious character  after running my example filter. Issue solved.

Thanks.

David


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: No registered users and 1 guest