Please help me with this text extraction

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

tiler
Posts: 3
Joined: Sat Jun 11, 2011 2:06 am

Please help me with this text extraction

Postby tiler » Sun Jun 12, 2011 10:00 am

Hi

Am really new to all this textpipe stuff its far to clever for me and was wondering if someone could lend a hand.

I have worked out how to extract individual items from some files I have converted fron html to text (conversion not in textpipe) but wondered if this was possible.

I have a large number of pages all with the same structure, within the top 50 or so lines of each page is the detail I need. Below is a cut and paste and the details I need in bold.

Using individual filters for telephone and website I can do but is there a way to combine these filters to remove the text I want.



Code: Select all

Tru and Grand - Uk- For all your Ashes


,




,,,,,,,,,,,



,,

,,
,



Tru and Grand


Ashes specialists Wales and Australia





Company,[b]Tru and Grand [/b],

Click For Website

Contact,[b]Mr Ash[/b](
Address,[b]Unit 1/
Top Road
On a hill
Wales
UK
ABC 123[/b] (MAP)

Telephone,[b]12345 456 234[/b]
Fax,[b]321 8566 999[/b]
Email,[b].......[/b]
Website,[b]wwww.oops.com[/b]

Tru and Grand  was founded in 1650 and is very sorry but a specialist in Ashes


The email is not visible due to a script in the html, I have offline explorer here so maybe I can get it by mining that way unless there is a better idea. The (MAP) reference is a googmaps link


Your help would be greatfully received

Tiler
Last edited by tiler on Mon Jun 13, 2011 10:11 pm, edited 1 time in total.

tiler
Posts: 3
Joined: Sat Jun 11, 2011 2:06 am

Re: Please help me with this text extraction

Postby tiler » Mon Jun 13, 2011 10:08 pm

Hi

Am still struggling with the above, I have got this far :

website,[ 1 + chars ] > website,wwww.oops.com .....................Would like just the web address but I can remove that in excel I guess

Telephone,[ 1 + digits ] > Telephone,12345 456 234 .........................As above really

Address,[ 1 + chars ] > Address,Unit 1/ ..............................Can't get this at all stops on first line of address


The email is a problem as it does not show up in the text page, in the coded page however it shows up just as ( well at least I think this is it ):

Code: Select all

 {
      s=s + t.charAt(l-i);
  }
 
  document.write('<A href=\'mailto:' + s + '?subject=Enquiry from ashes.co.uk\'>' + s + '</a>');
}
</SCRIPT>



Can anyone help me move on with any of the above please ??

Tiler

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Please help me with this text extraction

Postby DataMystic Support » Tue Jun 14, 2011 4:43 pm

You can't easily extract email addresses from script.

You would have to attach some script to each web page that runs at the end of the page rendering, which then extracts it.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

tiler
Posts: 3
Joined: Sat Jun 11, 2011 2:06 am

Re: Please help me with this text extraction

Postby tiler » Tue Jun 14, 2011 10:12 pm

Hi

Thank you for that, we can safely say then that I won't be getting the email addresses as I don't have a clue about that.


I have managed to get the website out using

Code: Select all

(?:website)(?:.+)(?:
)



I can get all the details out together as below,

Company,Tru and Grand ,
Contact,Mr Ash(
Address,Unit 1/
Top Road
On a hill
Wales
UK
ABC 123
(MAP)

Telephone,12345 456 234
Fax,321 8566 999
Email,.......
Website,wwww.oops.com


Using :

Code: Select all

(?:company)(?:.+)(?:
)(?:website)(?:.+)(?:
)


I have also made some adjustments using other filters.


What I can not do and maybe you would be willing to help is :

Get just company and website out together ?

When I get the above into excel they list vertically I want them to list horizontally across the page ?


It has taken me 3 days to get this far as I know nothing of code whatsoever and I am truely stuck now..........

Thank you

Tiler

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Please help me with this text extraction

Postby DataMystic Support » Wed Jun 15, 2011 9:13 am

In the 'replace with' box, put

Code: Select all

$1,$2,$3,$4


or

Code: Select all

@company,@website,@etc
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: No registered users and 3 guests