Delete words with less than X characters in a HTML Tag

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

gerd
Posts: 39
Joined: Wed Mar 12, 2008 10:52 pm

Delete words with less than X characters in a HTML Tag

Postby gerd » Fri Mar 08, 2013 12:40 am

I need to delete all strings and words within a HTML Tag like H1 which are less than 6 Characters and do not start with a Capital Letter

Example: <h1>I need to delete all strings and words within a HTML Tag like H1 which are less than 6 Characters and do not start with a Capital Letter</h1>
Requested result: <h1>Characters Capital Letter</h1>

I have been playing with [A-Z]{6,}? but have no idea how to proceed. What are the filter commends for this task of deleting strings within a certain HTML tag?
Thanks in advance
gerd

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Delete words with less than X characters in a HTML Tag

Postby DataMystic Support » Tue Mar 12, 2013 11:05 am

Try:

Code: Select all

<h1>^[A-Z][^<]{5,}</h1>
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

gerd
Posts: 39
Joined: Wed Mar 12, 2008 10:52 pm

Re: Delete words with less than X characters in a HTML Tag

Postby gerd » Tue Mar 12, 2013 7:30 pm

Simon,

<h1>^[A-Z][^<]{6,}</h1>
this does not work. Example: <h1>here some words BBQ Änderung Östereich</h1>
The requested result should be
<h1>Änderung Österreich</h1>

In other words: Extract whole words only starting with a capital letter consisting of a least X characters.

I assume that the replacement would read $0 with activated Extract option

Thanks
gerd

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Delete words with less than X characters in a HTML Tag

Postby DataMystic Support » Wed Mar 13, 2013 10:17 am

Ah, that is much clearer.

Ok, first add a regex

Code: Select all

<h1>([^<]+)</h1>


and set the Replace Action to Send variable 1 to subfilter

Then add a second regex inside this, with a regex pattern of

Code: Select all

\b[A-Z]\w+?\b


Do not use the Extract option, just set the replacement to blank, and ensure Match Case is ON. Note - you will also have to change the definition of A-Z to include any special letters.

You will also have to add a Filters\Remove\Blanks from start of line filter, after the 2nd regex (inside the first regex) as well.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

gerd
Posts: 39
Joined: Wed Mar 12, 2008 10:52 pm

Re: Delete words with less than X characters in a HTML Tag

Postby gerd » Wed Mar 13, 2013 8:53 pm

Thanks Simon,

that regex is very close and works. The only missing point of the above example is now: How to include

Words starting with a capital letter consisting of a least X characters.

how to put e.g. {5,}? or something else in your above regex ?

Thanks
gerd

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Delete words with less than X characters in a HTML Tag

Postby DataMystic Support » Thu Mar 14, 2013 6:16 am

Sorry - I missed that, here it is.

Code: Select all

\b[A-Z]\w{5,}?\b
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: Yahoo [Bot] and 4 guests