How do I Extract 2 Seperate Tags/Lines from HTML ??

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

pheagila
Posts: 9
Joined: Mon Aug 18, 2008 9:04 pm

How do I Extract 2 Seperate Tags/Lines from HTML ??

Postby pheagila » Sun Sep 07, 2008 12:19 am

I am trying to extract 2 seperate lines from HTML

If I try with only 1 line it works, BUT if I try both I get 0 byte output

2 lines that require extraction are:

<h1 id="Results">..DATA...</h1>
<span id="Data">..DATA...</span>

Code: Select all

Perl pattern [<h1 id="Results">(.*)</h1> <span id="Data">(.*)</span>] with [$1\r\n $2\r\n]
   [X] Extract matches
   Maximum text buffer size 99999
   [X] '.' matches newline

Cheers

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Postby DataMystic Support » Mon Sep 08, 2008 7:37 am

Naturally after you extract the first line type, there is no text left to match the second type.

You need to combine the patterns like this:

Code: Select all

<h1 id="Results">.*</h1>|<span id="Data">.*</span>

pheagila
Posts: 9
Joined: Mon Aug 18, 2008 9:04 pm

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Postby pheagila » Mon Sep 08, 2008 11:00 pm

DataMystic Support wrote:Naturally after you extract the first line type, there is no text left to match the second type.
You need to combine the patterns like this:

Code: Select all

<h1 id="Results">.*</h1>|<span id="Data">.*</span>

thank you Support - that worked perfectly !! :)

1) I assume the | acts as an AND ??
2) Is there any difference between .* and (.*) ??
3) What is the difference between $0 & $1$2 as they produce different outputs ??

Final issue I need to solve is with output of data with correct new lines

Code: Select all

Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [$0\r\n]
   [X] Extract matches
   Maximum text buffer size 99999
   [X] '.' matches newline

the following code outputs:

Results ABC
Data 123
Results DEF
Data 456

However, the output I need is:

Results ABC
Data 123
--- NEW LINE ---
Results DEF
Data 456
--- NEW LINE ---

Cheers

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Postby DataMystic Support » Tue Sep 09, 2008 10:59 pm

1) | means OR, not AND!
2) & 3) (.*) captures the bit in brackets so that it can be used in $1, $2 etc in the output

Why don't you replace the word 'Results' with '\r\nResults' to get the new line?
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

pheagila
Posts: 9
Joined: Mon Aug 18, 2008 9:04 pm

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Postby pheagila » Tue Sep 09, 2008 11:53 pm

DataMystic Support wrote:Why don't you replace the word 'Results' with '\r\nResults' to get the new line?

Hi Support,

THe following examples below output incorrectly

Code: Select all

Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [\r\n$1$2]
OR
Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [\r\n$0]
OR
Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [$1$2\r\n]
Outputs:
Results ABC
Data 123
Results DEF
Data 456

Can you provide a code example for the "Replace with"?

Results ABC
Data 123
--- NEW LINE ---
Results DEF
Data 456
--- NEW LINE ---

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Postby DataMystic Support » Wed Sep 10, 2008 10:40 pm

Do a perl pattern search for

Code: Select all

Data (\d+)

and replace with

Code: Select all

$0\r\n


Sorry - without seeing a sample of source data it is hard to help!
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: Baidu [Spider], Bing [Bot] and 1 guest