User-named character classes?

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

User-named character classes?

Postby DFH » Sat Dec 31, 2011 4:09 am

Suppose I wish to match for a pattern that [e.g.] consists of any UTF-8 character in the Czech alphabet (in either case).
See http://en.wikipedia.org/wiki/Czech_alphabet

Excluding the "Ch" diglot, a Perl pattern that does this would be as follows:

Code: Select all

[A-Za-z\x{00C1}\x{00C9}\x{00CD}\x{00D3}\x{00DA}\x{00DD}\x{00E1}\x{00E9}\x{00ED}\x{00F3}\x{00FA}\x{00FD}\x{010C}\x{010D}\x{010E}\x{010F}\x{011A}\x{011B}\x{0147}\x{0148}\x{0158}\x{0159}\x{0160}\x{0161}\x{0164}\x{0165}\x{016E}\x{016F}\x{017D}\x{017E}]
This is equivalent to the shorter pattern

Code: Select all

[A-Za-zÁÉÍÓÚÝáéíóúýČčĎďĚěŇňŘřŠšŤťŮůŽž]
The latter will not work when entered as a simple Perl pattern in TextPipe, so one has to use the more complicated one with all the hexadecimal codes.

It would be much simpler if there was a facility to define user-named character classes, such that a much shorter pattern name can be used, perhaps by extending the POSIX notation such that

Code: Select all

[:czech:]
would be equivalent to the above pattern.

I can't use captured text and store it in a global variable, as the files to be processed will not contain it.

Am I forced to resort to VBScript, or is there a simpler more open method?

David

User avatar
DataMystic Support
Site Admin
Posts: 2136
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: User-named character classes?

Postby DataMystic Support » Sun Jan 22, 2012 10:28 pm

Hi David,

A proposed solution, is in the perl search/replace mode, when utf8 support is checked, the unicode data entered is converted to utf8 before being passed to the perl module.

This allows your simpler pattern to pass through without any problems, and results in the same output as the more complex sample.

You can see this trial in action in
http://www.datamystic.com/textpipestandard2.exe - available in an hour or so.

- let me know if it meets your needs, and also if there are any side-effects.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: User-named character classes?

Postby DFH » Wed Jan 25, 2012 6:22 pm

Hi Simon,

I was away when you posted that - if I get time today, I'll give it a try.

David


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: No registered users and 1 guest