How do I Extract Text Between Two Fields from HTML files?

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

pheagila
Posts: 9
Joined: Mon Aug 18, 2008 9:04 pm

How do I Extract Text Between Two Fields from HTML files?

Postby pheagila » Mon Aug 18, 2008 9:14 pm

Hi all,

I would like to Extract Text Between Two Fields from many HTML files

i.e all the Text between:

<!-- Start Results Section -->
.... Data .....
<!-- End Results Section -->

I would like all the extracted text combined together and output to abc.txt

Below are my current settings, but it is NOT working as it also copies a lot of data 'outside' of the tags

Can anyone help me with what I am doing wrong? (yes I am new to TextPipe)

Code: Select all

Restrict to between tags <<!-- Start Results Section -->>...<<!-- End Results Section -->>
|  [X] Include text
|  [X] Match case
| Max size: 65536
|
+--Merge output to file C:\1\abc.txt 

User avatar
Fixer
Posts: 22
Joined: Thu Jul 31, 2008 6:39 am
Location: European Union > Poland
Contact:

Re: How do I Extract Text Between Two Fields from HTML files?

Postby Fixer » Fri Aug 22, 2008 11:08 am

I almost always use perl pattern

Code: Select all

Filter List
-----------
Filter options
|  [ ] Log to file
|  [X] Append to logfile
|  Log filename: textpipe.log
|  Threshold 500
|
|--Input from file(s)
|     [ ] Confirm before processing each file
|     [ ] Confirm before processing read/only files
|     [ ] Delete input files after processing
|     Process binary files
|   
|--Perl pattern [<\!-- Start Results Section -->\r\n(.*)\r\n<\!-- End Results Section -->\r\n] with [$1\r\n]
|     [ ] Match case
|     [ ] Whole words only
|     [ ] Case sensitive replace
|     [ ] Prompt on replace
|     [ ] Skip prompt if identical
|     [ ] First only
|     [ ] Extract matches
|     Maximum text buffer size 99999
|     [ ] Maximum match (greedy)
|     [ ] Allow comments
|     [X] '.' matches newline
|     [X] UTF-8 Support
|   
|--Remove blank lines
|   
+--Merge output to file c:\mergefilename.txt
   

Files List
----------

pheagila
Posts: 9
Joined: Mon Aug 18, 2008 9:04 pm

Re: How do I Extract Text Between Two Fields from HTML files?

Postby pheagila » Sat Aug 23, 2008 5:20 pm

Fixer wrote:I almost always use perl pattern

Code: Select all

Filter List
-----------
Filter options
|  [ ] Log to file
|  [X] Append to logfile
|  Log filename: textpipe.log
|  Threshold 500
|
|--Input from file(s)
|     [ ] Confirm before processing each file
|     [ ] Confirm before processing read/only files
|     [ ] Delete input files after processing
|     Process binary files
|   
|--Perl pattern [<\!-- Start Results Section -->\r\n(.*)\r\n<\!-- End Results Section -->\r\n] with [$1\r\n]
|     [ ] Match case
|     [ ] Whole words only
|     [ ] Case sensitive replace
|     [ ] Prompt on replace
|     [ ] Skip prompt if identical
|     [ ] First only
|     [ ] Extract matches
|     Maximum text buffer size 99999
|     [ ] Maximum match (greedy)
|     [ ] Allow comments
|     [X] '.' matches newline
|     [X] UTF-8 Support
|   
|--Remove blank lines
|   
+--Merge output to file c:\mergefilename.txt
   

Files List
----------

Thanks Fixer

How do I import what you have typed above directly into TextPipe Pro?

Cheers

User avatar
DataMystic Support
Site Admin
Posts: 2154
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How do I Extract Text Between Two Fields from HTML files?

Postby DataMystic Support » Mon Aug 25, 2008 9:42 am

What is shown above is just a clipboard export and can't be input directly. Soon we will have an XML export/import facility.

The key is that you are using a restriction and that is not what it is intended for. Please read the help on restrictions.

You just need to use a search/replace filter with the 'Extract matches' option.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

pheagila
Posts: 9
Joined: Mon Aug 18, 2008 9:04 pm

Re: How do I Extract Text Between Two Fields from HTML files?

Postby pheagila » Mon Aug 25, 2008 6:37 pm

DataMystic Support wrote:What is shown above is just a clipboard export and can't be input directly. Soon we will have an XML export/import facility.

The key is that you are using a restriction and that is not what it is intended for. Please read the help on restrictions.

You just need to use a search/replace filter with the 'Extract matches' option.

thanks DataMystic Support

Can you give me a Clipboard Export example like Fixer for "use search/replace filter with 'Extract matches' option"?

User avatar
DataMystic Support
Site Admin
Posts: 2154
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How do I Extract Text Between Two Fields from HTML files?

Postby DataMystic Support » Mon Aug 25, 2008 11:05 pm

Sure:

Code: Select all

|--Perl pattern [<!-- Start Results Section -->(.*)<!-- End Results Section -->] with [$1\r\n]
|     [ ] Match case
|     [ ] Whole words only
|     [ ] Case sensitive replace
|     [ ] Prompt on replace
|     [ ] Skip prompt if identical
|     [ ] First only
|     [X] Extract matches
|     Maximum text buffer size 64096
|     [ ] Maximum match (greedy)
|     [ ] Allow comments
|     [X] '.' matches newline
|     [ ] UTF-8 Support
|   
+--Merge output to file C:\1\abc.txt
   
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

pheagila
Posts: 9
Joined: Mon Aug 18, 2008 9:04 pm

Re: How do I Extract Text Between Two Fields from HTML files?

Postby pheagila » Sun Aug 31, 2008 2:08 pm

thanks Support but your Perl pattern filter returns 0 bytes

Code: Select all

|--Perl pattern [<!-- Start Results Section -->(.*)<!-- End Results Section -->] with [$1\r\n]
|     [ ] Match case
|     [ ] Whole words only
|     [ ] Case sensitive replace
|     [ ] Prompt on replace
|     [ ] Skip prompt if identical
|     [ ] First only
|     [X] Extract matches
|     Maximum text buffer size 99999
|     [ ] Maximum match (greedy)
|     [ ] Allow comments
|     [X] '.' matches newline
|     [ ] UTF-8 Support

This filter seems to work

Code: Select all

|--Extract [<!-- Start Auction Results Section -->(.*)<!-- End Auction Results Section -->]
|     [ ] Include line numbers
|     [ ] Include filename
|     [X] Match case
|     [ ] Count matches
|     Pattern type: 0

What do I need to change to get your Perl Pattern filter to work ?

Cheers

User avatar
DataMystic Support
Site Admin
Posts: 2154
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How do I Extract Text Between Two Fields from HTML files?

Postby DataMystic Support » Mon Sep 01, 2008 3:51 pm

Please just send us an email referencing this discussion and we can send you a filter.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: No registered users and 1 guest