Intelligent XML processing?

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

lamdur
Posts: 2
Joined: Wed Mar 16, 2005 1:15 pm

Intelligent XML processing?

Postby lamdur » Fri Mar 18, 2005 4:30 am

This question is in regard to replace filters, but is generalizable for other types of XML-based operations.

Is it possible to process XML data in such a way that all data within the boundaries of tag expressions and processing instructions, e.g., <tag attribute="foo"> is skipped, and only the PCDATA wrapped within tags is matched? The restrict filter seems to be not quite intelligent enough to perform this sort of operation (it may be possible to do this with a regex expression, but I cannot figure out how to write the expression properly).

Example:

Let's say I wanted to find the quote character and replace it with a tilde (but only in PCDATA):

Input:

Code: Select all

<tag1 attribute1="foo" attribute2="bar">Hello<phrase attribute="planet">"World!"<?proc: "foo"?></phrase></tag1>


Output:

Code: Select all

<tag1 attribute1="foo" attribute2="bar">Hello<phrase attribute="planet">~World!~<?proc: "foo"?></phrase></tag1>
Lou Amdur

User avatar
DataMystic Support
Site Admin
Posts: 2154
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Postby DataMystic Support » Wed Apr 06, 2005 11:17 am

Why doesn't the HTML/XML restriction work for this?

Just set it to ignore the tab contents, and only process the data between the tags.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

lamdur
Posts: 2
Joined: Wed Mar 16, 2005 1:15 pm

Why?

Postby lamdur » Wed Apr 06, 2005 11:30 am

Because (a) it doesn't seem to like attributes within elements, and (b) doesn't seem capable of properly treating in-line elements. Try it on the samples I posted and let me know how the restrict to XML element filter works out for you.
Lou Amdur

User avatar
DataMystic Support
Site Admin
Posts: 2154
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Postby DataMystic Support » Wed Apr 06, 2005 11:51 am

To avoid changing the quotes inside the

Code: Select all

<?proc: "foo"?>

you'll have to use a three pass replace.

This filter works nicely:

Code: Select all

Restrict to between tags <tag1>...</tag1>
|  [ ] Include text
|  [ ] Match case
|
+--Restrict to between tags <phrase>...</phrase>
   |  [ ] Include text
   |  [ ] Match case
   |
   |--Perl pattern ["] with [###MARKER###]
   |     [ ] Match case
   |     [ ] Whole words only
   |     [ ] Case sensitive replace
   |     [ ] Prompt on replace
   |     [ ] Skip prompt if identical
   |     [ ] First only
   |     [ ] Extract matches
   |     Maximum text buffer size 4096
   |     [ ] Maximum match (greedy)
   |     [ ] Allow comments
   |     [X] '.' matches newline
   |     [ ] UTF-8 Support
   |   
   |--Perl pattern [<\?proc[^>]*>] with [$0]
   |  |  [ ] Match case
   |  |  [ ] Whole words only
   |  |  [ ] Case sensitive replace
   |  |  [ ] Prompt on replace
   |  |  [ ] Skip prompt if identical
   |  |  [ ] First only
   |  |  [ ] Extract matches
   |  |  Maximum text buffer size 4096
   |  |  [ ] Maximum match (greedy)
   |  |  [ ] Allow comments
   |  |  [X] '.' matches newline
   |  |  [ ] UTF-8 Support
   |  |
   |  +--Perl pattern [###MARKER###] with ["]
   |        [ ] Match case
   |        [ ] Whole words only
   |        [ ] Case sensitive replace
   |        [ ] Prompt on replace
   |        [ ] Skip prompt if identical
   |        [ ] First only
   |        [ ] Extract matches
   |        Maximum text buffer size 4096
   |        [ ] Maximum match (greedy)
   |        [ ] Allow comments
   |        [X] '.' matches newline
   |        [ ] UTF-8 Support
   |     
   +--Perl pattern [###MARKER###] with [~]
         [ ] Match case
         [ ] Whole words only
         [ ] Case sensitive replace
         [ ] Prompt on replace
         [ ] Skip prompt if identical
         [ ] First only
         [ ] Extract matches
         Maximum text buffer size 4096
         [ ] Maximum match (greedy)
         [ ] Allow comments
         [X] '.' matches newline
         [ ] UTF-8 Support
       


Let me know if you want me to email this to you.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: No registered users and 2 guests