Need help extracting text

A discussion of how to use EasyPatterns, EasyPattern Helper and using the EasyPattern library.

Moderator: DataMystic Support

boris
Posts: 4
Joined: Thu Jun 13, 2013 5:39 am

Need help extracting text

Postby boris » Thu Jun 13, 2013 6:12 am

Hi

I need to extract the text from a document with lots lines like this:

Code: Select all

   vesico-umbilicalis                              1
   vesico-ureteraler                              1
   vorkehren                                          1
   Wilsonsche                                       1


The lines always begin with whitespace, then there is 1 word, with or without 1 or more hyphens, and the line always ends with a number.
All I need is the word.

I'll need TextPipe to extract the words and save them to a text file.
Each word is on its own line and will have its own line in the text file.

Please give me a starting point, it cannot be that difficult, but I just don't get it.

Thanks a lot!
Boris

tumtum
Posts: 10
Joined: Wed Jun 29, 2011 5:34 pm

Re: Need help extracting text

Postby tumtum » Fri Jun 14, 2013 3:02 am

Hi , Boris

First thing you should to scope output what you want .

If you want only word in each line you should have 3 step to do.

1. Use filter convert EOL and set auto detect to DOS
2. Find and replace input into 1 word per line by filter replace .
3. Extract output in pattern 1 word per line.

Example

1. Use filter convert EOL and set auto detect to DOS to make sure you can known End of line in file to process .

2. Use filter "Replace -> find pattern(perl style)" to extract only word in each lines .

You can use regular expression below to find pattern from your data

^ *([a-zA-Z\-]+) *([0-9]+) *$


to replace to

$1


when you use this filter , you can got only word in each line .

If you not sure all lines have pattern like your example .

However, you still need output .

You can extract only output in pattern you want .

3. Use filter "Extract -> Extract lines matching (grep)"

use regular expression below to extract pattern from your data

^([a-zA-Z\-]+)$


This filter above can confirm pattern output you got .

If you need other help by text pipe , please feel free to ask me.

Panupong Sanprasit

boris
Posts: 4
Joined: Thu Jun 13, 2013 5:39 am

Re: Need help extracting text

Postby boris » Sun Jun 23, 2013 5:31 pm

Panupong,

Thank you. Will try it. Sorry for not getting back, was and still am busy and have to postpone this experiment for a few weeks. Will get back to you ASAIC.

Boris


Return to “EasyPatterns Support”

Who is online

Users browsing this forum: No registered users and 2 guests