Delete non-word characters

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

canis
Posts: 9
Joined: Fri Jan 18, 2008 5:45 pm

Delete non-word characters

Postby canis » Wed Jan 23, 2008 9:51 pm

Is it possible to delete non-word characters from Start of Line and from End of Line?
Thank You.

DFH
Posts: 654
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Deleting non-word characters

Postby DFH » Thu Jan 24, 2008 2:40 am

Use a Perl pattern replace list

Code: Select all

^(\W+)     by nothing
(\W+)$     by nothing

canis
Posts: 9
Joined: Fri Jan 18, 2008 5:45 pm

Postby canis » Thu Jan 24, 2008 4:22 pm

It doesn't work :( It removes only first or last non-word character.

For example I have

:: . , TEXT ; . . , .

and I need only

TEXT

DFH
Posts: 654
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Perl patterns - greedy or not greedy

Postby DFH » Thu Jan 24, 2008 9:55 pm

Click on the button next to the Perl pattern (labelled with 3 dots).
Ensure that greedy matching is ticked.

Code: Select all

Filter List
-----------
Filter options
|  [ ] Log to file
|  [X] Append to logfile
|  Log filename: textpipe.log
|  Threshold 500
|
|--Input from file(s)
|     [ ] Confirm before processing each file
|     [ ] Confirm before processing read/only files
|     [ ] Delete input files after processing
|     Process binary files
|   
|--Comment...
|     Remove non Word characters from start and end of lines
|   
|--Perl pattern [^(\W+)] with []
|     [ ] Match case
|     [ ] Whole words only
|     [ ] Case sensitive replace
|     [ ] Prompt on replace
|     [ ] Skip prompt if identical
|     [ ] First only
|     [ ] Extract matches
|     Maximum text buffer size 4096
|     [X] Maximum match (greedy)
|     [ ] Allow comments
|     [ ] '.' matches newline
|     [ ] UTF-8 Support
Perl pattern [(\W+)$] with []
|     [ ] Match case
|     [ ] Whole words only
|     [ ] Case sensitive replace
|     [ ] Prompt on replace
|     [ ] Skip prompt if identical
|     [ ] First only
|     [ ] Extract matches
|     Maximum text buffer size 4096
|     [X] Maximum match (greedy)
|     [ ] Allow comments
|     [ ] '.' matches newline
|     [ ] UTF-8 Support
|   
+--Output to file(s)
      [ ] Only update date on changed files
      [X] Keep original file's date and time
      [ ] Append mode
      [ ] Change extension to: .txt
    Backup mode   

Files List
----------
This works with your example.

canis
Posts: 9
Joined: Fri Jan 18, 2008 5:45 pm

Postby canis » Thu Jan 24, 2008 11:30 pm

Thanks' it works!!!

There is one more question - can I use IF-condition?

For example I have a line with latin and cyrillic characters:

some text, <cyrillic1> some text <cyrillic2> <cyrillic3> some text <cyrillic4>...

If between cyrillic words there are more then X characters I need to place carriage return after last cyrillic word.
For example in this case I need:

some text, <cyrillic1>
some text <cyrillic2> <cyrillic3>
some text <cyrillic4>...


I know how to identify cyrillic words - [a-z].
Is it possible to use IF-condition, or how to make such transform?

DFH
Posts: 654
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Cyrillic and Latin text in the same line

Postby DFH » Fri Jan 25, 2008 12:19 am

Coping with Cyrillic and Latin in the same line is much more difficult. The Cyrillic text could be encoded either as Unicode or as Codepage 1251 (MS-Windows ANSI), or as various other methods such as KOI8 as used on Apple Macintosh.

Since you are processing stuff found in emails, then presumably you can't be sure what platform the text originated from.

Do you already know how the Cyrillic text is encoded?
Do you anticipate coping with any other scripts apart from Latin and Cyrillic?

btw. TextPipe Standard can process Unicode files, but the special filters needed to convert between Unicode text encodings are in TextPipe Pro.

canis
Posts: 9
Joined: Fri Jan 18, 2008 5:45 pm

Postby canis » Fri Jan 25, 2008 12:33 am

All text is in windows-1251.

But is it realy important? I think it is possible to identify cyrillic words isung perl [а-я].

2. This script will be the last in sequence of 3-5 scripts. I use TextPipe with cyrillic often - and I have no to convert text using latin and cyrillic in same lines.


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: Baidu [Spider] and 8 guests