Page 1 of 1

User-named character classes?

Posted: Sat Dec 31, 2011 4:09 am
by DFH
Suppose I wish to match for a pattern that [e.g.] consists of any UTF-8 character in the Czech alphabet (in either case).

Excluding the "Ch" diglot, a Perl pattern that does this would be as follows:

Code: Select all

This is equivalent to the shorter pattern

Code: Select all

The latter will not work when entered as a simple Perl pattern in TextPipe, so one has to use the more complicated one with all the hexadecimal codes.

It would be much simpler if there was a facility to define user-named character classes, such that a much shorter pattern name can be used, perhaps by extending the POSIX notation such that

Code: Select all

would be equivalent to the above pattern.

I can't use captured text and store it in a global variable, as the files to be processed will not contain it.

Am I forced to resort to VBScript, or is there a simpler more open method?


Re: User-named character classes?

Posted: Sun Jan 22, 2012 10:28 pm
by DataMystic Support
Hi David,

A proposed solution, is in the perl search/replace mode, when utf8 support is checked, the unicode data entered is converted to utf8 before being passed to the perl module.

This allows your simpler pattern to pass through without any problems, and results in the same output as the more complex sample.

You can see this trial in action in - available in an hour or so.

- let me know if it meets your needs, and also if there are any side-effects.

Re: User-named character classes?

Posted: Wed Jan 25, 2012 6:22 pm
by DFH
Hi Simon,

I was away when you posted that - if I get time today, I'll give it a try.