Extract certain HTML tags

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

Post Reply
JimC
Posts: 3
Joined: Thu Mar 15, 2007 10:19 pm

Extract certain HTML tags

Post by JimC » Tue Oct 02, 2012 6:01 am

I am trying to extract some data from webpages. All of the content I need is contained within two:
<div class="someclass">content</div>
tags.
What is best FILTER to extract just these two tage from a file and then proceed with further processing?

Something like an extract HTML/XML pair would be perfect, but I dont see that as an option

User avatar
DataMystic Support
Site Admin
Posts: 2222
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Extract certain HTML tags

Post by DataMystic Support » Tue Oct 02, 2012 9:11 am

Hi Jim,

Just a perl pattern:

Code: Select all

<div class="someclass">(.*)</div>
replace with

Code: Select all

$1
and check 'Extract'.
Regards,

Simon Carter, https://www.DataMystic.com
https://www.JadeDiabetes.com - Insulin dose calculator for Type 1 diabetes
https://www.DownloadPipe.com - 250,000 free software downloads

Post Reply