Multiple patterns simultaneously

Posted: Fri Sep 09, 2005 10:58 pm
by Stevod
I need to scan HTML files in order to extract URLs.

I can create individual easypatterns to recognise and extract one particular form of URL, however how can I pick up on a number of forms without rescanning the file e.g. I want those which are part of an <a href> statement, but also those starting "www." in plain text.

Can I combine the two forms in a single easypattern :?

Posted: Wed Sep 14, 2005 3:06 pm
by DataMystic Support
The best approach for 2 or more extractions is to read our whitepaper:

Essentially you output all extracted text to a new line with a marker, then you discard all lines without the marker.