Some external replace list files no longer work when the search type is Perl

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

Post Reply
DFH
Posts: 805
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Some external replace list files no longer work when the search type is Perl

Post by DFH » Wed Sep 12, 2018 3:50 am

One of my very old filters no longer works properly.

It calls an external replace list (tab delimited file) in which most of the PCRE search text items are wrapped between \Q...\E

Has something changed recently for \Q...\E in PCRE ?

I currently have TextPipe Standard 10.7.2 installed.
IIRC, it was working OK earlier this year when I still had version 10.6.2 installed.

Best regards,

David
Last edited by DFH on Fri Sep 14, 2018 1:35 am, edited 3 times in total.

DFH
Posts: 805
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Has something changed for \Q...\E in PCRE ?

Post by DFH » Thu Sep 13, 2018 4:28 am

Hi Simon,

Something much worse must have changed since my longstanding filter was working OK.

Such external replace list files no longer work when the search type is Perl.
They only work when the search type is Exact.

NB. This bug seems to occur only when the replace list filter is a subfilter of a restrict filter.

Because of the bug, I had to radically alter my .tab file to get the filter working again.
e.g. Instead of

Code: Select all

\Q\x+add\E
I had to use

Code: Select all

\x5C+add
in the search field.

NB. This is a very serious bug that could affect many existing filters, potentially for all users.

Best regards,

David
Last edited by DFH on Fri Sep 14, 2018 1:36 am, edited 1 time in total.

DFH
Posts: 805
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: External replace list files no longer work when the search type is Perl

Post by DFH » Fri Sep 14, 2018 12:32 am

Here's the original filter that used to work OK:

Code: Select all

TextPipe Single User Edition 10.7
Purchased by: David Haslam, David Haslam

Filter Title: C:\Users\David\Documents\DataMystic\TextPipe\!David\USFM tag statistics.fll

Filter List
-----------
Filter options
|  [X] Log to file
|  [X] Append to logfile
|  Log filename: textpipe.log
|  Threshold 500
|  [ ] Log comment filters
|
|--Input from file(s)
|     [ ] Confirm before processing each file
|     [ ] Confirm before processing read/only files
|     [ ] Delete input files after processing
|     [ ] Process inside compressed files
|     Confirm each binary file
|       Sample size 100 characters
|   
|--Comment...
|  |  USFM tag statistics
|  |  
|  |  Outputs a 'count duplicate lines' list for the SFM tags.
|  |  
|  |  Start and close tags in a tag pair are listed separately.
|  |  
|  |   Excludes + and similar parameters used in footnotes & cross-references 
|  |   Excludes optional line break // and ~
|  |   Excludes delimiters | within \fig ...\fig*
|  |  
|  |  Filter improved to catch all the tags
|  |  Filter enhanced to add tag descriptions
|  |  Filter improved to include nested tags with \+
|  |
|  |--Comment...
|  |     Revision history:
|  |     
|  |     2017-09-15 Restrict to field 3 now sub-filters each field individually, etc.
|  |   
|  |--Comment...
|  |  |  Workaround for unseparated tag groups
|  |  |  
|  |  |   e.g. \global\cnum=14\relax
|  |  |        \null\null\eject
|  |  |
|  |  +--Perl pattern [(\w|\d)(\\)(\w)] with [$1 $2$$3]
|  |        [X] Match case
|  |        [ ] Whole words only
|  |        [ ] Case sensitive replace
|  |        [ ] Prompt on replace
|  |        [ ] Skip prompt if identical
|  |        [ ] First only
|  |        [ ] Extract matches
|  |            Maximum text buffer size 4096
|  |        [ ] Maximum match (greedy)
|  |        [ ] Allow comments
|  |        [ ] '.' matches newline
|  |        [X] UTF-8 Support

|  |        [ ] Process longest strings first
|  |        [ ] Simultaneous search
|  |        [ ] Log summary only

|  |      Further search/replace list phrases (CSV format):
|  |      >, >
|  |      
|  |--Comment...
|  |  |  Extract tags
|  |  |  
|  |  |   NB. Does it now catch \cnum=## ?
|  |  |
|  |  +--Perl pattern [\\(\w+|\+\w+)(\s|\*|=\d+)] with [\\$1$$2\r\n]
|  |        [X] Match case
|  |        [ ] Whole words only
|  |        [ ] Case sensitive replace
|  |        [ ] Prompt on replace
|  |        [ ] Skip prompt if identical
|  |        [ ] First only
|  |        [X] Extract matches
|  |            Maximum text buffer size 4096
|  |        [ ] Maximum match (greedy)
|  |        [ ] Allow comments
|  |        [ ] '.' matches newline
|  |        [X] UTF-8 Support
|  |      
|  |--Comment...
|  |  |  Remove blanks
|  |  |
|  |  |--Remove blanks from End of Line
|  |  |   
|  |  +--Remove blank lines
|  |      
|  |--Comment...
|  |  |  Counted list of tags
|  |  |
|  |  |--Count duplicate lines
|  |  |     [ ] Ignore case
|  |  |     Start column 1
|  |  |     Length 15
|  |  |     [X] Include One
|  |  |     format: %5.5d\t%s
|  |  |   
|  |  +--Add file header [Count\tSFM tag]
|  |      
|  |--Comment...
|  |  |  Add tag descriptions +
|  |  |  
|  |  |     From USFM Reference version 2.35 & later
|  |  |  
|  |  |   + The replacement table is incomplete
|  |  |     Tags that can have numbers may need further entries
|  |  |     Excludes: Peripherals & Study Bible Content
|  |  |     
|  |  |     Of these possibilities to name tag pairs
|  |  |       begin ... end     (story  metaphor) 
|  |  |       open  ... close   (gate   metaphor)
|  |  |       start ... finish  (race   metaphor)
|  |  |       start ... stop    (action metaphor)
|  |  |     I have chosen the first convention.
|  |  |     The USFM Reference is not 100% consistent.
|  |  |
|  |  |--Copy fields:2 copy to 3
|  |  |     Delimiter type: 1
|  |  |     Custom delimiter: 
|  |  |     Text qualifier : 2
|  |  |     Custom qualifier: 
|  |  |     [ ] Has Header
|  |  |   
|  |  +--Restrict fields:3
|  |     |  [X] Process fields individually
|  |     |    [X] Exclude delimiter
|  |     |      [ ] Exclude quotes (if present)
|  |     |  Delimiter type: 1
|  |     |  Custom delimiter: 
|  |     |  Text qualifier : 2
|  |     |  Custom qualifier: 
|  |     |  [ ] Has Header
|  |     |
|  |     +--Replace list: C:\Users\David\TextPipe Filters\USFM tag descriptions.tab Perl pattern
|  |           [X] Match case
|  |           [X] Whole words only
|  |           [ ] Case sensitive replace
|  |           [ ] Prompt on replace
|  |           [ ] Skip prompt if identical
|  |           [ ] First only
|  |           [ ] Extract matches
|  |               Maximum text buffer size 4096
|  |           [ ] Maximum match (greedy)
|  |           [ ] Allow comments
|  |           [ ] '.' matches newline
|  |           [X] UTF-8 Support

|  |           [X] Process longest strings first
|  |           [ ] Simultaneous search
|  |           [ ] Log summary only
|  |         
|  +--Comment...
|     |  Invalid tag descriptions
|     |  
|     |   - descriptions ending with an asterix are invalid
|     |     because the asterisk should have been replaced
|     |
|     +--Restrict fields:3
|        |  [X] Process fields individually
|        |    [X] Exclude delimiter
|        |      [ ] Exclude quotes (if present)
|        |  Delimiter type: 1
|        |  Custom delimiter: 
|        |  Text qualifier : 0
|        |  Custom qualifier: "
|        |  [ ] Has Header
|        |
|        +--Perl pattern [(.+)\*] with [### SYNTAX ERROR ###]
|              [X] Match case
|              [ ] Whole words only
|              [ ] Case sensitive replace
|              [ ] Prompt on replace
|              [ ] Skip prompt if identical
|              [ ] First only
|              [ ] Extract matches
|                  Maximum text buffer size 4096
|              [X] Maximum match (greedy)
|              [ ] Allow comments
|              [ ] '.' matches newline
|              [X] UTF-8 Support

|              [ ] Process longest strings first
|              [ ] Simultaneous search
|              [ ] Log summary only
|            
+--Output to file(s)
      [ ] Only update date on changed files
      [ ] Append mode
      [X] Change extension to: @inputExtension.tags.count.usfm
      [X] Open output file
      Only output modified files
      [ ] Remove empty output files    

Files List
----------
X:\merged.usfm

and here is my updated filter:

Code: Select all

TextPipe Single User Edition 10.7
Purchased by: David Haslam, David Haslam

Filter Title: C:\Users\David\Documents\DataMystic\TextPipe\!David\USFM tag statistics.fll

Filter List
-----------
Filter options
|  [X] Log to file
|  [X] Append to logfile
|  Log filename: textpipe.log
|  Threshold 500
|  [ ] Log comment filters
|
|--Input from file(s)
|     [ ] Confirm before processing each file
|     [ ] Confirm before processing read/only files
|     [ ] Delete input files after processing
|     [ ] Process inside compressed files
|     Confirm each binary file
|       Sample size 100 characters
|   
|--Comment...
|  |  USFM tag statistics
|  |  
|  |  Outputs a 'count duplicate lines' list for the SFM tags.
|  |  
|  |  Start and close tags in a tag pair are listed separately.
|  |  
|  |   Excludes + and similar parameters used in footnotes & cross-references 
|  |   Excludes optional line break // and ~
|  |   Excludes delimiters | within \fig ...\fig*
|  |  
|  |  Filter improved to catch all the tags
|  |  Filter enhanced to add tag descriptions
|  |  Filter improved to include nested tags with \+
|  |
|  |--Comment...
|  |     Revision history:
|  |     
|  |     2017-09-15 Restrict to field 3 now sub-filters each field individually, etc.
|  |     2018-09-12 Changed search type to Exact for Add tag descriptions.
|  |                Now uses an edited copy of the replace list tab file.
|  |                Change required on account of a bug in TextPipe 10.7.2
|  |   
|  |--Comment...
|  |  |  Workaround for unseparated tag groups
|  |  |  
|  |  |   e.g. \global\cnum=14\relax
|  |  |        \null\null\eject
|  |  |
|  |  +--Perl pattern [(\w|\d)(\\)(\w)] with [$1 $2$$3]
|  |        [X] Match case
|  |        [ ] Whole words only
|  |        [ ] Case sensitive replace
|  |        [ ] Prompt on replace
|  |        [ ] Skip prompt if identical
|  |        [ ] First only
|  |        [ ] Extract matches
|  |            Maximum text buffer size 4096
|  |        [ ] Maximum match (greedy)
|  |        [ ] Allow comments
|  |        [ ] '.' matches newline
|  |        [X] UTF-8 Support

|  |        [ ] Process longest strings first
|  |        [ ] Simultaneous search
|  |        [ ] Log summary only

|  |      Further search/replace list phrases (CSV format):
|  |      >, >
|  |      
|  |--Comment...
|  |  |  Extract tags
|  |  |  
|  |  |   NB. Does it now catch \cnum=## ?
|  |  |
|  |  +--Perl pattern [\\(\w+|\+\w+)(\s|\*|=\d+)] with [\\$1$$2\r\n]
|  |        [X] Match case
|  |        [ ] Whole words only
|  |        [ ] Case sensitive replace
|  |        [ ] Prompt on replace
|  |        [ ] Skip prompt if identical
|  |        [ ] First only
|  |        [X] Extract matches
|  |            Maximum text buffer size 4096
|  |        [ ] Maximum match (greedy)
|  |        [ ] Allow comments
|  |        [ ] '.' matches newline
|  |        [X] UTF-8 Support
|  |      
|  |--Comment...
|  |  |  Remove blanks
|  |  |
|  |  |--Remove blanks from End of Line
|  |  |   
|  |  +--Remove blank lines
|  |      
|  |--Comment...
|  |  |  Counted list of tags
|  |  |
|  |  |--Count duplicate lines
|  |  |     [ ] Ignore case
|  |  |     Start column 1
|  |  |     Length 15
|  |  |     [X] Include One
|  |  |     format: %5.5d\t%s
|  |  |   
|  |  +--Add file header [Count\tSFM tag]
|  |      
|  |--Comment...
|  |  |  Add tag descriptions +
|  |  |  
|  |  |     From USFM Reference version 2.35 & later
|  |  |  
|  |  |   + The replacement table is incomplete
|  |  |     Tags that can have numbers may need further entries
|  |  |     Excludes: Peripherals & Study Bible Content
|  |  |     
|  |  |     Of these possibilities to name tag pairs
|  |  |       begin ... end     (story  metaphor) 
|  |  |       open  ... close   (gate   metaphor)
|  |  |       start ... finish  (race   metaphor)
|  |  |       start ... stop    (action metaphor)
|  |  |     I have chosen the first convention.
|  |  |     The USFM Reference is not 100% consistent.
|  |  |
|  |  |--Copy fields:2 copy to 3
|  |  |     Delimiter type: 1
|  |  |     Custom delimiter: 
|  |  |     Text qualifier : 2
|  |  |     Custom qualifier: 
|  |  |     [ ] Has Header
|  |  |   
|  |  +--Restrict fields:3
|  |     |  [X] Process fields individually
|  |     |    [X] Exclude delimiter
|  |     |      [ ] Exclude quotes (if present)
|  |     |  Delimiter type: 1
|  |     |  Custom delimiter: 
|  |     |  Text qualifier : 0
|  |     |  Custom qualifier: 
|  |     |  [ ] Has Header
|  |     |
|  |     +--Replace list: C:\Users\David\TextPipe Filters\USFM tag descriptions!.tab Replace
|  |           [X] Match case
|  |           [X] Whole words only
|  |           [ ] Case sensitive replace
|  |           [ ] Prompt on replace
|  |           [ ] Skip prompt if identical
|  |           [ ] First only
|  |           [ ] Extract matches

|  |           [X] Process longest strings first
|  |           [ ] Simultaneous search
|  |           [ ] Log summary only
|  |         
|  +--Comment...
|     |  Invalid tag descriptions
|     |  
|     |   - descriptions ending with an asterix are invalid
|     |     because the asterisk should have been replaced
|     |
|     +--Restrict fields:3
|        |  [X] Process fields individually
|        |    [X] Exclude delimiter
|        |      [ ] Exclude quotes (if present)
|        |  Delimiter type: 1
|        |  Custom delimiter: 
|        |  Text qualifier : 0
|        |  Custom qualifier: "
|        |  [ ] Has Header
|        |
|        +--Perl pattern [(.+)\*] with [### SYNTAX ERROR ###]
|              [X] Match case
|              [ ] Whole words only
|              [ ] Case sensitive replace
|              [ ] Prompt on replace
|              [ ] Skip prompt if identical
|              [ ] First only
|              [ ] Extract matches
|                  Maximum text buffer size 4096
|              [X] Maximum match (greedy)
|              [ ] Allow comments
|              [ ] '.' matches newline
|              [X] UTF-8 Support

|              [ ] Process longest strings first
|              [ ] Simultaneous search
|              [ ] Log summary only
|            
+--Output to file(s)
      [ ] Only update date on changed files
      [ ] Append mode
      [X] Change extension to: @inputExtension.tags.count.usfm
      [X] Open output file
      Only output modified files
      [ ] Remove empty output files    

Files List
----------
X:\merged.usfm
In both cases, the filter that stopped working and had to be changed is under the comment "Add tag descriptions +".
The updated filter uses an edited copy of the replace list (.tab file)

Post Reply