Search/replace option Simultaneous search bug?

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

DFH
Posts: 658
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Search/replace option Simultaneous search bug?

Postby DFH » Sun Mar 23, 2014 12:14 am

If the search type is Perl, and
if the search pattern contains \x{hhh..} character with hex code hhh... (UTF-8 mode only)
then if you tick the option Simultaneous search,
you get a popup dialog with this kind of error message.
"Error on line #: character value in \x{...} sequence is too large."
Screenshot 2014-03-22 14.07.42.png
Popup screenshot
I see no fundamental reason why this should be invalid, so I guess this must be a software bug.

Best regards,

David

User avatar
DataMystic Support
Site Admin
Posts: 2164
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Search/replace option Simultaneous search bug?

Postby DataMystic Support » Mon Mar 24, 2014 12:21 pm

Hi David,

Here is what I found, searching within PCRE results:

Note that despite its name, in utf8 mode \x{...} does not match utf8 sequences but rather "real" unicode codepoints.

Code: Select all

# cyrillic letter "ю is the codepoint 44E
# its utf8 representation is  D1 8E

$u = "Hello \xd1\x8e"; // this string is in utf8
echo $u, "<br>";
echo preg_replace('~\x{44e}~u', '*', $u); // preg matches codepoint 
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 658
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Search/replace option Simultaneous search bug?

Postby DFH » Tue Mar 25, 2014 2:47 am

How does that answer my issue?

User avatar
DataMystic Support
Site Admin
Posts: 2164
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Search/replace option Simultaneous search bug?

Postby DataMystic Support » Tue Mar 25, 2014 6:21 am

Are you specifying a code point, or a hex value?

What value are you specifying? It looks like a bug in the PCRE regex engine, but I can't confirm without this.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 658
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Search/replace option Simultaneous search bug?

Postby DFH » Fri Mar 28, 2014 1:56 am

Here's the sub-filter that fails when I tick Simultaneous search:

Code: Select all

Perl pattern [\x{05BE}] with [#]
   [X] Match case
   [ ] Whole words only
   [ ] Case sensitive replace
   [ ] Prompt on replace
   [ ] Skip prompt if identical
   [ ] First only
   [ ] Extract matches
   Maximum text buffer size 4096
   [ ] Maximum match (greedy)
   [ ] Allow comments
   [ ] '.' matches newline
   [X] UTF-8 Support

 Further search/replace list phrases (CSV format):
 \x{05C0},#
 \x{05C3},#
 \x{05C6},#
 

Here's what it looks like when that option is not ticked:

Code: Select all

Perl pattern [\x{05BE}] with [#]
   [X] Match case
   [ ] Whole words only
   [ ] Case sensitive replace
   [ ] Prompt on replace
   [ ] Skip prompt if identical
   [ ] First only
   [ ] Extract matches
   Maximum text buffer size 4096
   [ ] Maximum match (greedy)
   [ ] Allow comments
   [ ] '.' matches newline
   [X] UTF-8 Support

 Further search/replace list phrases (CSV format):
 \x{05C0},#
 \x{05C3},#
 \x{05C6},#
 

It would seem that the tick box option has no counterpart in the visual representation of the replace list filter.
This in itself is also a cause for concern.
Screenshot 2014-03-27 15.53.54.png
Screenshot of my replace list filter.

User avatar
DataMystic Support
Site Admin
Posts: 2164
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Search/replace option Simultaneous search bug?

Postby DataMystic Support » Mon Jun 30, 2014 1:04 pm

Hi David,

The Display text now outputs 'simultaneous search' and 'Process longest strings first' options for the next release.

According to the PCRE spec:
\x{hhh..} - character with hex code hhh.. (non-JavaScript mode)

By default, after \x, from zero to two hexadecimal digits are read (letters can be in upper or lower case). Any number of hexadecimal digits may appear between \x{ and }, but the character code is constrained as follows:

8-bit non-UTF mode less than 0x100
8-bit UTF-8 mode less than 0x10ffff and a valid codepoint
16-bit non-UTF mode less than 0x10000
16-bit UTF-16 mode less than 0x10ffff and a valid codepoint
32-bit non-UTF mode less than 0x80000000
32-bit UTF-32 mode less than 0x10ffff and a valid codepoint

Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called "surrogate" codepoints), and 0xffef.

If characters other than hexadecimal digits appear between \x{ and }, or if there is no terminating }, this form of escape is not recognized. Instead, the initial \x will be interpreted as a basic hexadecimal escape, with no following digits, giving a character whose value is zero.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 658
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Search/replace option Simultaneous search bug?

Postby DFH » Sun Jul 27, 2014 4:50 am

I will try again with the next release, though I suspect that the things you've fixed for filter display do not address the underlying issue.

David


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: Bing [Bot] and 9 guests