Extract text between tags

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

Post Reply
Posts: 9
Joined: Fri Jan 18, 2008 5:45 pm

Extract text between tags

Post by canis » Sun Mar 15, 2009 12:47 am

I have a lot of html files with similar structure:
<td align=right>numbers</td>

How to extract everything between tags to different files?
At this moment I know how to extract numbers between first pair of tags:

Extract lines matching [<td>47:28]
Remove HTML and XML
Remove blanks from Start of Line
Remove blanks from End of Line

I can't guess how to extract text between other pairs. Can anybody help?

User avatar
DataMystic Support
Site Admin
Posts: 2278
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia

Re: Extract text between tags

Post by DataMystic Support » Mon Mar 16, 2009 9:38 pm

Use perl pattern search/replace, find

Code: Select all

Action: Send var 1 to subfilter.

As a subfilter, add a Special\Secondary Output filter - directing output to the file you need.

Repeat these two steps for each section you need.

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest