TextPipe 8.0 stops during a previously well-behaved filter

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

DFH
Posts: 658
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

TextPipe 8.0 stops during a previously well-behaved filter

Postby DFH » Sun Dec 09, 2007 2:59 am

One of my filters I've been using successfully without a hitch since I bought TextPipe Standard 7.9.5 in August has suddenly stopped working after I upgraded to version 8.0 and installed it in accordance with instructions.

The filter just stops at a certain number (98,304) of bytes, and I have to Cancel.

Since emailing Simon, I have isolated the problem to be somewhere within the following lines:

Code: Select all

|--Comment...
|  |  Convert the <div2> attribute values using a search & replace list
|  |
|  |--Restrict to attribute title="..."
|  |  |  [ ] Include quotes
|  |  |  [ ] Include text
|  |  |  [ ] Match case
|  |  | Max size: 65536
|  |  |
|  |  +--Perl pattern [(\w\w\w)] with [$1]
|  |     |  [ ] Match case
|  |     |  [ ] Whole words only
|  |     |  [ ] Case sensitive replace
|  |     |  [ ] Prompt on replace
|  |     |  [ ] Skip prompt if identical
|  |     |  [ ] First only
|  |     |  [ ] Extract matches
|  |     |  Maximum text buffer size 4096
|  |     |  [ ] Maximum match (greedy)
|  |     |  [ ] Allow comments
|  |     |  [X] '.' matches newline
|  |     |  [ ] UTF-8 Support
|  |     |
|  |     +--Replace list: C:\Program Files\TextPipe\My Filters\book_names.csv Replace
|  |           [X] Match case
|  |           [X] Whole words only
|  |           [ ] Case sensitive replace
|  |           [ ] Prompt on replace
|  |           [ ] Skip prompt if identical
|  |           [ ] First only
|  |           [ ] Extract matches
|  |         
|  +--Restrict to attribute n="..."
|     |  [ ] Include quotes
|     |  [ ] Include text
|     |  [ ] Match case
|     | Max size: 65536
|     |
|     +--Perl pattern [(\w\w\w)] with [$1]
|        |  [ ] Match case
|        |  [ ] Whole words only
|        |  [ ] Case sensitive replace
|        |  [ ] Prompt on replace
|        |  [ ] Skip prompt if identical
|        |  [ ] First only
|        |  [ ] Extract matches
|        |  Maximum text buffer size 4096
|        |  [ ] Maximum match (greedy)
|        |  [ ] Allow comments
|        |  [X] '.' matches newline
|        |  [ ] UTF-8 Support
|        |
|        +--Replace list: C:\Program Files\TextPipe\My Filters\book_numbers.csv Replace
|              [X] Match case
|              [X] Whole words only
|              [ ] Case sensitive replace
|              [ ] Prompt on replace
|              [ ] Skip prompt if identical
|              [ ] First only
|              [ ] Extract matches

DFH
Posts: 658
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Further evidence on the software bug in TextPipe 8.0

Postby DFH » Sun Dec 09, 2007 5:00 am

I have now isolated the problem to the search filter action "Send matching text to subfilter".
98,304 in decimal is 18000 in hexadecimal.
That this is a round number in hex strongly suggests that this is a software bug.

Changing both the external replace list files to imported replace lists made no difference to the bug, nor did changing the search pattern to an equivalent expression.

User avatar
DataMystic Support
Site Admin
Posts: 2164
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Postby DataMystic Support » Tue Dec 11, 2007 2:26 pm

Hi David - we'll be getting back to you on this shortly
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 658
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Further information

Postby DFH » Wed Dec 12, 2007 5:15 am

The bug is not unique to my own computer. Same thing happened to a contact of mine in Sweden, who was using my filter.

DFH
Posts: 658
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Any progress to report on solving this?

Postby DFH » Thu Dec 13, 2007 10:55 pm

Simon,

Any progress towards solving this? Do you need any further detailed information from me?

Kind regards,
David

User avatar
DataMystic Support
Site Admin
Posts: 2164
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Postby DataMystic Support » Fri Dec 14, 2007 5:38 am

Stay tuned David
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 658
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Email received and sent

Postby DFH » Sat Dec 15, 2007 9:59 pm

Simon,

I found your email dated 11th only last night, diverted (as by ISP spam filter) to the bulk folder.

Responded with attachments today.

David

DFH
Posts: 658
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Further clues about the replace list bug

Postby DFH » Sun Dec 16, 2007 1:13 am

If you make a simple filter in which a comment has a replace list as a subfilter, then if you export the filter to clipboard, only the first row of the replace list is indented as part of the subfilter. The remaining rows are NOT indented, but are in the same level as the comment.

Code: Select all

|--Comment...
|  |  Miscellaneous punctuation corrections
|  |
|  +--Replace [..] with [.]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
Replace [!.] with [!]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
Replace [.,] with [,]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
Replace [,.] with [.]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
This could be the real bug. For this example, the comment being the parent, it has no effect. However, if the parent was a restrict filter, the effect would be very significant. The same bug might just as well apply to a replace list from an external file as to an internal replace list. This observation matches the symptoms.

DFH
Posts: 658
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

The above example has been like this for a long time

Postby DFH » Mon Dec 17, 2007 6:48 pm

I just tried something similar with the earlier version of TextPipe that is installed in my computer at work. Version 7.1.7 is a few years old.
The same issue affects a replace list subfilter of (for example) a restrict filter.

Code: Select all

|   
|--Comment...
|   
|--Restrict lines:Line 8 .. line 18
|  |
|  +--Perl pattern [A] with [Z]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
|        Maximum text buffer size 4096
|        [X] Maximum match (greedy)
|        [ ] Allow comments
|        [ ] '.' matches newline
|        [X] UTF-8 Support
Perl pattern [B] with [Y]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
|        Maximum text buffer size 4096
|        [X] Maximum match (greedy)
|        [ ] Allow comments
|        [ ] '.' matches newline
|        [X] UTF-8 Support
Perl pattern [C] with [X]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
|        Maximum text buffer size 4096
|        [X] Maximum match (greedy)
|        [ ] Allow comments
|        [ ] '.' matches newline
|        [X] UTF-8 Support
|     
Every row in the replace list table should be indented below the restrict filter, rather than as shown above.

Either the filter itself must be significantly in error, or the export to clipboard feature must be in error.
If it turns out to be merely the latter, then please raise as a separate issue.

User avatar
DataMystic Support
Site Admin
Posts: 2164
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Postby DataMystic Support » Wed Dec 19, 2007 10:33 pm

Hi David,

The key problem seems to be a horrendously inefficient search filter, which then has 86 search attempts made against it.

We can make this way more efficient by only sending characters that we know will match, and Matching Case to prevent false positives.

Code: Select all

[1234ABCDEGHIJLMNOPRSTWZ][abCdehiJKmoprSTuxz][1abcdeghiJklmnoprstuvz]


This also turns out to be very slow.

We'll look into it some more in the morning.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 658
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

If so, then why did it work so smoothly in v7.9.5 ?

Postby DFH » Thu Dec 20, 2007 1:40 am

Dear Simon,

Your response does not explain why my filter worked so smoothly with TextPipe Standard v7.9.5 and just stops in version 8.0

Processing a whole [Bible] VPL text file took less than 5 seconds before. If my filter was really that inefficient, why was it so fast?

Or have I misunderstood something you wrote?

NB. I did not understand the above code snippet at all. What was this intended to do ?

Best regards,
David Haslam

DFH
Posts: 658
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Reverted to use version 7.9.5

Postby DFH » Fri Dec 21, 2007 4:11 am

Dear Simon,

While DataMystic are still working towards a solution, I have re-installed version 7.9.5 in place of version 8.0 - so that I can continue using the filter as before.

Best regards,
David H.

DFH
Posts: 658
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Any progress to report on solving this?

Postby DFH » Fri Jan 04, 2008 11:15 pm

Hi Simon,

Happy New Year!

Has there been any further progress towards solving the bug ?

Best regards,
David Haslam

User avatar
DataMystic Support
Site Admin
Posts: 2164
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Postby DataMystic Support » Tue Jan 08, 2008 7:57 pm

It should be done this week.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

User avatar
DataMystic Support
Site Admin
Posts: 2164
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Postby DataMystic Support » Wed Jan 09, 2008 9:31 am

Status: This bug was found to be due to the inefficient use of filters. In particular, the title attribute of every tag (and not just the div2 tag) was being checked for
\w\w\w (ie 3 word characters)
without the whole word option, which led to a lot of backtracking, and this was in turn searched against a replace list of 30 items.

A div2 tag restriction around the whole lot returned performance to previous levels.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: Baidu [Spider], Bing [Bot] and 2 guests