Extract Question/Problem

Get help with installation and running here.

Moderators: DataMystic Support, Moderators


Extract Question/Problem

Postby Fodor » Fri Dec 19, 2003 4:14 am

Hello, all!

I'm evaluating TextPipe Pro to data mine a web site. The site has nested tables, but I've found a way to get to the data with an extract and a regular expression rather than having to manually remove the unneeded tables. First, I convert the UNIX EOL characters to DOS and remove all leading and trailing whitespace. Then, I try to use the following:

^(<p align=left>).*.$

The problem is that it doesn't work. While this regular expression should match any line (and the entire line) starting with "<p align=left>", when I run the filter, TextPipe finds the first "<p align=left>" and returns it with the remainder of the file following the first "<p align=left>".

It looks like TextPipe might be seeing the ".*" and matching the EOL characters rather than stopping at the ".$". Is that the problem? If so, is it a poorly formed regular expression?

What I'm I doing wrong?



Extract Question/Problem Follow-up

Postby Fodor » Fri Dec 19, 2003 4:26 am

I've changed my regular expression to:

^(<p align=left>).*(<br>)$

(matching any line (the entire line) beginning with "<p align=left>" and ending with "<br>")

This matches 0 items in my input file, although there are 4 such lines in the file.

Any ideas?


User avatar
DataMystic Support
Site Admin
Posts: 2174
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia

Postby DataMystic Support » Fri Dec 19, 2003 8:20 am

'.' by default matches new lines - check the pattern settings.

You could use [^\r\n] instead of '.' to prevent this.

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: No registered users and 1 guest