How are subfilters handled?

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

dynalt
Posts: 5
Joined: Thu Jan 01, 2015 7:52 am

How are subfilters handled?

Postby dynalt » Thu Jan 01, 2015 8:06 am

I want to take multiple passes over a file or over the pattens recognized in a file.

I am processing HTML / XML and want to

1) Convert &xxx; to the appropriate character
2) parse a containing HTML pair (<metadata>...</metadata>
3) Parse the metadata string for other HTML pairs and put the entire result on a single line

I can't find out how a subfilter differs from a top level filter, and none of the sequences I try has worked.

Trial #1:
1) Extract metadata pair to a line
<meta_data><ASIN>(.*)</ASIN><title>(.*)</title><authors>(.*)</authors><publishers>(.*)</publishers><publication_date>(\d{4}-\d{2}-\d{2}).*</publication_date>.*></meta_data>

$1\t$2\t$3\t$4\t$5

This much appears to work

When I add subfilters to convert the &xxx; patterns, nothing happens to them. If I make them top level patterns either before or after the metadata pattern, they don't appear to have any effect.

It seems that there is something I don't understand about how passes are made over the file and how subfilters play into the data processing.

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How are subfilters handled?

Postby DataMystic Support » Fri Jan 02, 2015 11:56 am

Are the patterns for handling the html entities (&xxx;) a subfilter ie inside the filter that identifies
<meta_data><ASIN>(.*)</ASIN><title>(.*)</

To add as a subfilter, drag and drop the html entity filters on top of the pattern match. If this doesn't work, please paste an extract from File Menu\Export.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

dynalt
Posts: 5
Joined: Thu Jan 01, 2015 7:52 am

Re: How are subfilters handled?

Postby dynalt » Fri Jan 02, 2015 11:19 pm

I have tried the element filters as subfilters of the metadata filter, as top level filters before and after the meta filter, and none of them seem to work.

When I add subfilters to convert the &xxx; patterns, nothing happens to them. If I make them top level patterns either before or after the metadata pattern, they don't appear to have any effect.

I get the metadata line extracted, but the HTML entities remain.

Here is the export.

TextPipe Single User Edition
Purchased by: DYNAMIC Alternatives, DYNAMIC Alternatives

Filter Title: J:\DYNALT\Amazon\Kindle.fll

Filter List
-----------
Filter options
| [ ] Log to file
| [ ] Append to logfile
| Log filename: %USERPROFILE%\textpipe.log
| Threshold 500
|
|--Input from file(s)
| [ ] Confirm before processing each file
| [ ] Confirm before processing read/only files
| [ ] Delete input files after processing
| Skip binary files
| Sample size 100 characters
|
|--Perl pattern [<meta_data><ASIN>(.*)</ASIN><title>(.*)</title><authors>(.*)</authors><publishers>(.*)</publishers><publication_date>(\d{4}-\d{2}-\d{2}).*</publication_date>.*></meta_data>] with [$1\t$2\t$3\t$4\t$5]
| | [ ] Match case
| | [ ] Whole words only
| | [ ] Case sensitive replace
| | [ ] Prompt on replace
| | [ ] Skip prompt if identical
| | [ ] First only
| | [ ] Extract matches
| | Maximum text buffer size 4096
| | [ ] Maximum match (greedy)
| | [ ] Allow comments
| | [X] '.' matches newline
| | [ ] UTF-8 Support
| |
| |--Replace [&amp;] with [&]
| | [ ] Match case
| | [ ] Whole words only
| | [ ] Case sensitive replace
| | [ ] Prompt on replace
| | [ ] Skip prompt if identical
| | [ ] First only
| | [ ] Extract matches
| |
| |--Replace [&gt;] with [>]
| | [ ] Match case
| | [ ] Whole words only
| | [ ] Case sensitive replace
| | [ ] Prompt on replace
| | [ ] Skip prompt if identical
| | [ ] First only
| | [ ] Extract matches
| |
| |--Replace [&lt;] with [<]
| | [ ] Match case
| | [ ] Whole words only
| | [ ] Case sensitive replace
| | [ ] Prompt on replace
| | [ ] Skip prompt if identical
| | [ ] First only
| | [ ] Extract matches
| |
| +--Replace [&quot;] with ["]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
|
+--Output to file(s)
[ ] Only update date on changed files
[ ] Append mode
[ ] Change extension to: .txt
[ ] Open output file
Only output modified files Backup mode [ ] Remove empty output files

Files List
----------
J:\DYNALT\Amazon\KindleSyncMetadataCache.xml
Use the line below to remove common non-text files from website processing
.[ 'gif' or 'png' or 'jpg' or 'bmp' or 'avi' or 'ico' or 'mp3', lineEnd ]
Use the line below to remove common non-text folders from website processing
_vti

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How are subfilters handled?

Postby DataMystic Support » Sat Jan 03, 2015 7:54 am

Works fine here - turn on prompt on replace for every search filter so that you can see where the issue is.

I used test text of:

Code: Select all

<meta_data><ASIN>dsfsdj &amp;   </ASIN><title>dsfsdj &amp;   </title><authors>

dsfsdj &amp;   

</authors><publishers>

dsfsdj &amp;   

</publishers><publication_date>2015-01-01
</publication_date>

other guff

<other></other></meta_data>
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: Baidu [Spider] and 4 guests