Page 1 of 1

Some external replace list files no longer work when the search type is Perl

Posted: Wed Sep 12, 2018 3:50 am
by DFH
One of my very old filters no longer works properly.

It calls an external replace list (tab delimited file) in which most of the PCRE search text items are wrapped between \Q...\E

Has something changed recently for \Q...\E in PCRE ?

I currently have TextPipe Standard 10.7.2 installed.
IIRC, it was working OK earlier this year when I still had version 10.6.2 installed.

Best regards,

David

Re: Has something changed for \Q...\E in PCRE ?

Posted: Thu Sep 13, 2018 4:28 am
by DFH
Hi Simon,

Something much worse must have changed since my longstanding filter was working OK.

Such external replace list files no longer work when the search type is Perl.
They only work when the search type is Exact.

NB. This bug seems to occur only when the replace list filter is a subfilter of a restrict filter.

Because of the bug, I had to radically alter my .tab file to get the filter working again.
e.g. Instead of

Code: Select all

\Q\x+add\E
I had to use

Code: Select all

\x5C+add
in the search field.

NB. This is a very serious bug that could affect many existing filters, potentially for all users.

Best regards,

David

Re: External replace list files no longer work when the search type is Perl

Posted: Fri Sep 14, 2018 12:32 am
by DFH
Here's the original filter that used to work OK:

Code: Select all

TextPipe Single User Edition 10.7
Purchased by: David Haslam, David Haslam

Filter Title: C:\Users\David\Documents\DataMystic\TextPipe\!David\USFM tag statistics.fll

Filter List
-----------
Filter options
|  [X] Log to file
|  [X] Append to logfile
|  Log filename: textpipe.log
|  Threshold 500
|  [ ] Log comment filters
|
|--Input from file(s)
|     [ ] Confirm before processing each file
|     [ ] Confirm before processing read/only files
|     [ ] Delete input files after processing
|     [ ] Process inside compressed files
|     Confirm each binary file
|       Sample size 100 characters
|   
|--Comment...
|  |  USFM tag statistics
|  |  
|  |  Outputs a 'count duplicate lines' list for the SFM tags.
|  |  
|  |  Start and close tags in a tag pair are listed separately.
|  |  
|  |   Excludes + and similar parameters used in footnotes & cross-references 
|  |   Excludes optional line break // and ~
|  |   Excludes delimiters | within \fig ...\fig*
|  |  
|  |  Filter improved to catch all the tags
|  |  Filter enhanced to add tag descriptions
|  |  Filter improved to include nested tags with \+
|  |
|  |--Comment...
|  |     Revision history:
|  |     
|  |     2017-09-15 Restrict to field 3 now sub-filters each field individually, etc.
|  |   
|  |--Comment...
|  |  |  Workaround for unseparated tag groups
|  |  |  
|  |  |   e.g. \global\cnum=14\relax
|  |  |        \null\null\eject
|  |  |
|  |  +--Perl pattern [(\w|\d)(\\)(\w)] with [$1 $2$$3]
|  |        [X] Match case
|  |        [ ] Whole words only
|  |        [ ] Case sensitive replace
|  |        [ ] Prompt on replace
|  |        [ ] Skip prompt if identical
|  |        [ ] First only
|  |        [ ] Extract matches
|  |            Maximum text buffer size 4096
|  |        [ ] Maximum match (greedy)
|  |        [ ] Allow comments
|  |        [ ] '.' matches newline
|  |        [X] UTF-8 Support

|  |        [ ] Process longest strings first
|  |        [ ] Simultaneous search
|  |        [ ] Log summary only

|  |      Further search/replace list phrases (CSV format):
|  |      >, >
|  |      
|  |--Comment...
|  |  |  Extract tags
|  |  |  
|  |  |   NB. Does it now catch \cnum=## ?
|  |  |
|  |  +--Perl pattern [\\(\w+|\+\w+)(\s|\*|=\d+)] with [\\$1$$2\r\n]
|  |        [X] Match case
|  |        [ ] Whole words only
|  |        [ ] Case sensitive replace
|  |        [ ] Prompt on replace
|  |        [ ] Skip prompt if identical
|  |        [ ] First only
|  |        [X] Extract matches
|  |            Maximum text buffer size 4096
|  |        [ ] Maximum match (greedy)
|  |        [ ] Allow comments
|  |        [ ] '.' matches newline
|  |        [X] UTF-8 Support
|  |      
|  |--Comment...
|  |  |  Remove blanks
|  |  |
|  |  |--Remove blanks from End of Line
|  |  |   
|  |  +--Remove blank lines
|  |      
|  |--Comment...
|  |  |  Counted list of tags
|  |  |
|  |  |--Count duplicate lines
|  |  |     [ ] Ignore case
|  |  |     Start column 1
|  |  |     Length 15
|  |  |     [X] Include One
|  |  |     format: %5.5d\t%s
|  |  |   
|  |  +--Add file header [Count\tSFM tag]
|  |      
|  |--Comment...
|  |  |  Add tag descriptions +
|  |  |  
|  |  |     From USFM Reference version 2.35 & later
|  |  |  
|  |  |   + The replacement table is incomplete
|  |  |     Tags that can have numbers may need further entries
|  |  |     Excludes: Peripherals & Study Bible Content
|  |  |     
|  |  |     Of these possibilities to name tag pairs
|  |  |       begin ... end     (story  metaphor) 
|  |  |       open  ... close   (gate   metaphor)
|  |  |       start ... finish  (race   metaphor)
|  |  |       start ... stop    (action metaphor)
|  |  |     I have chosen the first convention.
|  |  |     The USFM Reference is not 100% consistent.
|  |  |
|  |  |--Copy fields:2 copy to 3
|  |  |     Delimiter type: 1
|  |  |     Custom delimiter: 
|  |  |     Text qualifier : 2
|  |  |     Custom qualifier: 
|  |  |     [ ] Has Header
|  |  |   
|  |  +--Restrict fields:3
|  |     |  [X] Process fields individually
|  |     |    [X] Exclude delimiter
|  |     |      [ ] Exclude quotes (if present)
|  |     |  Delimiter type: 1
|  |     |  Custom delimiter: 
|  |     |  Text qualifier : 2
|  |     |  Custom qualifier: 
|  |     |  [ ] Has Header
|  |     |
|  |     +--Replace list: C:\Users\David\TextPipe Filters\USFM tag descriptions.tab Perl pattern
|  |           [X] Match case
|  |           [X] Whole words only
|  |           [ ] Case sensitive replace
|  |           [ ] Prompt on replace
|  |           [ ] Skip prompt if identical
|  |           [ ] First only
|  |           [ ] Extract matches
|  |               Maximum text buffer size 4096
|  |           [ ] Maximum match (greedy)
|  |           [ ] Allow comments
|  |           [ ] '.' matches newline
|  |           [X] UTF-8 Support

|  |           [X] Process longest strings first
|  |           [ ] Simultaneous search
|  |           [ ] Log summary only
|  |         
|  +--Comment...
|     |  Invalid tag descriptions
|     |  
|     |   - descriptions ending with an asterix are invalid
|     |     because the asterisk should have been replaced
|     |
|     +--Restrict fields:3
|        |  [X] Process fields individually
|        |    [X] Exclude delimiter
|        |      [ ] Exclude quotes (if present)
|        |  Delimiter type: 1
|        |  Custom delimiter: 
|        |  Text qualifier : 0
|        |  Custom qualifier: "
|        |  [ ] Has Header
|        |
|        +--Perl pattern [(.+)\*] with [### SYNTAX ERROR ###]
|              [X] Match case
|              [ ] Whole words only
|              [ ] Case sensitive replace
|              [ ] Prompt on replace
|              [ ] Skip prompt if identical
|              [ ] First only
|              [ ] Extract matches
|                  Maximum text buffer size 4096
|              [X] Maximum match (greedy)
|              [ ] Allow comments
|              [ ] '.' matches newline
|              [X] UTF-8 Support

|              [ ] Process longest strings first
|              [ ] Simultaneous search
|              [ ] Log summary only
|            
+--Output to file(s)
      [ ] Only update date on changed files
      [ ] Append mode
      [X] Change extension to: @inputExtension.tags.count.usfm
      [X] Open output file
      Only output modified files
      [ ] Remove empty output files    

Files List
----------
X:\merged.usfm

and here is my updated filter:

Code: Select all

TextPipe Single User Edition 10.7
Purchased by: David Haslam, David Haslam

Filter Title: C:\Users\David\Documents\DataMystic\TextPipe\!David\USFM tag statistics.fll

Filter List
-----------
Filter options
|  [X] Log to file
|  [X] Append to logfile
|  Log filename: textpipe.log
|  Threshold 500
|  [ ] Log comment filters
|
|--Input from file(s)
|     [ ] Confirm before processing each file
|     [ ] Confirm before processing read/only files
|     [ ] Delete input files after processing
|     [ ] Process inside compressed files
|     Confirm each binary file
|       Sample size 100 characters
|   
|--Comment...
|  |  USFM tag statistics
|  |  
|  |  Outputs a 'count duplicate lines' list for the SFM tags.
|  |  
|  |  Start and close tags in a tag pair are listed separately.
|  |  
|  |   Excludes + and similar parameters used in footnotes & cross-references 
|  |   Excludes optional line break // and ~
|  |   Excludes delimiters | within \fig ...\fig*
|  |  
|  |  Filter improved to catch all the tags
|  |  Filter enhanced to add tag descriptions
|  |  Filter improved to include nested tags with \+
|  |
|  |--Comment...
|  |     Revision history:
|  |     
|  |     2017-09-15 Restrict to field 3 now sub-filters each field individually, etc.
|  |     2018-09-12 Changed search type to Exact for Add tag descriptions.
|  |                Now uses an edited copy of the replace list tab file.
|  |                Change required on account of a bug in TextPipe 10.7.2
|  |   
|  |--Comment...
|  |  |  Workaround for unseparated tag groups
|  |  |  
|  |  |   e.g. \global\cnum=14\relax
|  |  |        \null\null\eject
|  |  |
|  |  +--Perl pattern [(\w|\d)(\\)(\w)] with [$1 $2$$3]
|  |        [X] Match case
|  |        [ ] Whole words only
|  |        [ ] Case sensitive replace
|  |        [ ] Prompt on replace
|  |        [ ] Skip prompt if identical
|  |        [ ] First only
|  |        [ ] Extract matches
|  |            Maximum text buffer size 4096
|  |        [ ] Maximum match (greedy)
|  |        [ ] Allow comments
|  |        [ ] '.' matches newline
|  |        [X] UTF-8 Support

|  |        [ ] Process longest strings first
|  |        [ ] Simultaneous search
|  |        [ ] Log summary only

|  |      Further search/replace list phrases (CSV format):
|  |      >, >
|  |      
|  |--Comment...
|  |  |  Extract tags
|  |  |  
|  |  |   NB. Does it now catch \cnum=## ?
|  |  |
|  |  +--Perl pattern [\\(\w+|\+\w+)(\s|\*|=\d+)] with [\\$1$$2\r\n]
|  |        [X] Match case
|  |        [ ] Whole words only
|  |        [ ] Case sensitive replace
|  |        [ ] Prompt on replace
|  |        [ ] Skip prompt if identical
|  |        [ ] First only
|  |        [X] Extract matches
|  |            Maximum text buffer size 4096
|  |        [ ] Maximum match (greedy)
|  |        [ ] Allow comments
|  |        [ ] '.' matches newline
|  |        [X] UTF-8 Support
|  |      
|  |--Comment...
|  |  |  Remove blanks
|  |  |
|  |  |--Remove blanks from End of Line
|  |  |   
|  |  +--Remove blank lines
|  |      
|  |--Comment...
|  |  |  Counted list of tags
|  |  |
|  |  |--Count duplicate lines
|  |  |     [ ] Ignore case
|  |  |     Start column 1
|  |  |     Length 15
|  |  |     [X] Include One
|  |  |     format: %5.5d\t%s
|  |  |   
|  |  +--Add file header [Count\tSFM tag]
|  |      
|  |--Comment...
|  |  |  Add tag descriptions +
|  |  |  
|  |  |     From USFM Reference version 2.35 & later
|  |  |  
|  |  |   + The replacement table is incomplete
|  |  |     Tags that can have numbers may need further entries
|  |  |     Excludes: Peripherals & Study Bible Content
|  |  |     
|  |  |     Of these possibilities to name tag pairs
|  |  |       begin ... end     (story  metaphor) 
|  |  |       open  ... close   (gate   metaphor)
|  |  |       start ... finish  (race   metaphor)
|  |  |       start ... stop    (action metaphor)
|  |  |     I have chosen the first convention.
|  |  |     The USFM Reference is not 100% consistent.
|  |  |
|  |  |--Copy fields:2 copy to 3
|  |  |     Delimiter type: 1
|  |  |     Custom delimiter: 
|  |  |     Text qualifier : 2
|  |  |     Custom qualifier: 
|  |  |     [ ] Has Header
|  |  |   
|  |  +--Restrict fields:3
|  |     |  [X] Process fields individually
|  |     |    [X] Exclude delimiter
|  |     |      [ ] Exclude quotes (if present)
|  |     |  Delimiter type: 1
|  |     |  Custom delimiter: 
|  |     |  Text qualifier : 0
|  |     |  Custom qualifier: 
|  |     |  [ ] Has Header
|  |     |
|  |     +--Replace list: C:\Users\David\TextPipe Filters\USFM tag descriptions!.tab Replace
|  |           [X] Match case
|  |           [X] Whole words only
|  |           [ ] Case sensitive replace
|  |           [ ] Prompt on replace
|  |           [ ] Skip prompt if identical
|  |           [ ] First only
|  |           [ ] Extract matches

|  |           [X] Process longest strings first
|  |           [ ] Simultaneous search
|  |           [ ] Log summary only
|  |         
|  +--Comment...
|     |  Invalid tag descriptions
|     |  
|     |   - descriptions ending with an asterix are invalid
|     |     because the asterisk should have been replaced
|     |
|     +--Restrict fields:3
|        |  [X] Process fields individually
|        |    [X] Exclude delimiter
|        |      [ ] Exclude quotes (if present)
|        |  Delimiter type: 1
|        |  Custom delimiter: 
|        |  Text qualifier : 0
|        |  Custom qualifier: "
|        |  [ ] Has Header
|        |
|        +--Perl pattern [(.+)\*] with [### SYNTAX ERROR ###]
|              [X] Match case
|              [ ] Whole words only
|              [ ] Case sensitive replace
|              [ ] Prompt on replace
|              [ ] Skip prompt if identical
|              [ ] First only
|              [ ] Extract matches
|                  Maximum text buffer size 4096
|              [X] Maximum match (greedy)
|              [ ] Allow comments
|              [ ] '.' matches newline
|              [X] UTF-8 Support

|              [ ] Process longest strings first
|              [ ] Simultaneous search
|              [ ] Log summary only
|            
+--Output to file(s)
      [ ] Only update date on changed files
      [ ] Append mode
      [X] Change extension to: @inputExtension.tags.count.usfm
      [X] Open output file
      Only output modified files
      [ ] Remove empty output files    

Files List
----------
X:\merged.usfm
In both cases, the filter that stopped working and had to be changed is under the comment "Add tag descriptions +".
The updated filter uses an edited copy of the replace list (.tab file)

Re: Some external replace list files no longer work when the search type is Perl

Posted: Mon May 06, 2019 1:41 pm
by DataMystic Support
I suspect that the issue here is that the external files are now parsed for macros.

Given that the file is being loaded, there is an argument that it should not be parsed as this could be done by another process.

Re: Some external replace list files no longer work when the search type is Perl

Posted: Tue May 07, 2019 6:13 pm
by DFH
Please amplify "parsed for macros" !

What was this introduced for?

David

Re: Some external replace list files no longer work when the search type is Perl

Posted: Tue May 07, 2019 10:31 pm
by DataMystic Support
Macros such as @inputfilename@, and environment variables such as %current_user% can be parsed.

I can't tell precisely when this was added without digging into the weeds, but could this be an issue?

Re: Some external replace list files no longer work when the search type is Perl

Posted: Tue May 21, 2019 4:05 am
by DFH
It may well be an issue.

If I had to change one replace list file, I may encounter others that also require a similar tweak.

Regards,

David

Re: Some external replace list files no longer work when the search type is Perl

Posted: Mon Mar 02, 2020 9:54 pm
by DFH
Has this issue yet had a more detailed investigation?

It was rather alarming when I initially reported it.

David

Re: Some external replace list files no longer work when the search type is Perl

Posted: Fri Mar 06, 2020 6:36 am
by DataMystic Support
Not as yet, but I have transferred it to the backlog for investigation

Re: Some external replace list files no longer work when the search type is Perl

Posted: Sun Apr 26, 2020 12:35 am
by DFH
Hi Simon,

This has now become a very critical issue!

A very complex filter that was working perfectly in March 2017 now no longer works properly!

Debugging the filter to apply a workaround for every external replace list is going to be a very tedious task.
I had hoped just to spend a small amount of time simply catching up with this earlier text development project before moving on to the next improvement that I had in mind.

Please give priority to fixing this. It shouldn't have been left in the backlog so long.

David