Find words starting with a capital letter

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

gerd
Posts: 39
Joined: Wed Mar 12, 2008 10:52 pm

Find words starting with a capital letter

Postby gerd » Thu Jan 05, 2012 5:08 pm

Hi,
I am struggling with a filter that should perform the following:
Find all words in a text which start with a capital letter and consist of at least 6 characters and extract them to a csv file.
Example Text:
You can also perform Partial Trial Runs by right-clicking on filters in the Filter list.

Target of Extraction:
Partial Filter

because those two words consist of at least 6 characters.

I am playing trial and error with ([A-Z](\w{6,})[a-z]) and other versions without any success. Any idea?
thanks gerd

User avatar
DataMystic Support
Site Admin
Posts: 2162
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Find words starting with a capital letter

Postby DataMystic Support » Sat Jan 07, 2012 3:22 pm

Hi Gerd,

Try:

Find (match case turned on)
[A-Z][a-z]{5,}?
Replace with
$0\r\n
Extract option on.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

gerd
Posts: 39
Joined: Wed Mar 12, 2008 10:52 pm

Re: Find words starting with a capital letter

Postby gerd » Sat Jan 07, 2012 10:43 pm

Thanks a lot,

that's the result what I was looking for. I guess I have somehow tried the line [A-Z][a-z]{5,} but surely without the missing question mark. I use the ? so far only as "at most one match". I guess I should take the time and go through the pages 77 - 106 of your manual carefully. Or do you have a hint to find the explanation how to use ? in this respect?
Anyhow, your hint is a great help for me.
gerd

DFH
Posts: 654
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Find words starting with a capital letter

Postby DFH » Fri Jan 13, 2012 4:10 am

Caveat!

Any case operations or case patterns are highly dependent on the alphabet for the language of the text being processed.

For languages with diacritics, the whole topic becomes much more complex.

And for some languages, there are further pitfalls to catch out the unwary. See
http://en.wikipedia.org/wiki/Dotless_i

which is a feature of Turkish, and a few other languages.

David

User avatar
DataMystic Support
Site Admin
Posts: 2162
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Find words starting with a capital letter

Postby DataMystic Support » Sat Jan 14, 2012 1:12 pm

Hi David,

You could try using the perl regex '\w' to match word characters in a locale-specific way.

The ? at the end of a +, * or {} repetition reverses the normal greediness.

In TextPipe, the default is to be non-greedy, so [a-z]{5,} matches only 5 chars if it can, whereas
[a-z]{5,}? matches as many characters as it can.

You can toggle the default greediness using the pattern options button [...] for each pattern.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 654
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Find words starting with a capital letter

Postby DFH » Sat Jan 14, 2012 9:44 pm

Simon,

Although TexpPipe is locale sensitive, the fact is that I retain the English locale settings (region and language) for my PC,
even though I'm working on any number of different foreign language text files.

David


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: No registered users and 1 guest