WebPipe downloads partial or entire web sites to your hard disk for data mining with TextPipe Pro. WebPipe is a custom version of Offline Explorer Pro with specialized extensions specifically for data mining work and for working with TextPipe Pro.
Using TextPipe Pro, WebPipe can be used to data mine content from part or all of any web site on a scheduled basis:
WebPipe allows you to download your favorite Web and FTP sites for later offline viewing, editing or browsing. Then use TextPipe Pro to data mine content or keywords from your competitor's web sites!
TextPipe Pro is a data extraction and text manipulation application that updates your web site, extracts data from databases, reformats and standardizes your electronic text and program source code, data mines unstructured text reports and your competitor's web sites, cleanses data in legacy databases, converts between a variety of mainframe and PC data formats - the possibilities are simply endless.
The Data Mining Dialog (found under Tools\Data Mining when TextPipe Pro installation is detected)
In order to effectively data mine content from web pages, you first have to remove all the extraneous information such as color and formatting, extra spaces, graphics, forms, comments, styles, advertising and embedded frames. To perform this step, we link to a predefined TextPipe filter in web site mining\data mine.fll. To use it, in the File Menu, choose Link to Filter, and then select the filter file. This includes the filter without modifying it.
Next we need to simplify the html tags to change "<table border="3" padding="3" etc>" into just "<table>". This will make it much easier to search and replace later on. To do this, we use a filter called web site mining\simplify tags.fll. Again, to use it, in the File Menu, choose Link to Filter, and then select the filter file.
Finally, to convert data from html table format to a CSV (comma-separated value) format that we can easily import into Excel, we use the filter web site mining\data mine html tables.fll. Again, to use it, in the File Menu, choose Link to Filter, and then select the filter file.
Once this is done, just drag and drop the file onto TextPipe's window, set the Output Filter to save the result file somewhere like the Desktop, and then click 'Go'.
It's worth noting that you may need to remove other html tables from headers and footers near your data. This must be done manually, because there is no way the software can determine what is junk data and what is not. To remove a table, in the Special Menu, choose Find and Replace (Find Pattern). A new search and replace filter is added, ensure it has a find type of Pattern (perl). The add text like '<table>.*</table>'. This will find a start and end table tag with anything in-between.
You can use WebPipe to download all or part of web sites on a scheduled basis and then feed them into TextPipe Pro automatically.
If you have trouble data mining yourself, why not pay one of our data mining consultants to do it for you?
You will need to download both programs to benefit from the combined power of TextPipe Pro and WebPipe. TextPipe Pro can also be used on its own to extract data from databases, or to manipulate text like websites, program source code, Framemaker files and more.
WebPipe is a specially customized version of Offline Explorer Pro. In addition, it contains