Restrict according to line length?

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Restrict according to line length?

Postby DFH » Fri Dec 17, 2010 7:50 pm

I recently encountered a need to restrict a sub-filter according to line length.

The text is UTF-8 (it's actually Arabic script), so length must be based on the number of Unicode characters in the line, rather than the number of bytes.

For one sub-filter, I'd like to restrict to lines shorter than a specified length.
For another sub-filter, I'd like to restrict to lines longer than a specified length.

Any suggestions?

David

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Restrict according to line length?

Postby DFH » Fri Dec 17, 2010 9:59 pm

The following idea already occurred to me.

This filter inserts "£££" at the start of text lines shorter than 30 characters:

Code: Select all

Comment...
|  Restrict on line length shorter than specified value
|
|--Add right margin [###]
|   
|--Restrict columns:Column 1 .. column 33
|  |
|  +--Perl pattern [^(.+)###$] with []
|     |  [X] Match case
|     |  [ ] Whole words only
|     |  [ ] Case sensitive replace
|     |  [ ] Prompt on replace
|     |  [ ] Skip prompt if identical
|     |  [ ] First only
|     |  [ ] Extract matches
|     |  Maximum text buffer size 4096
|     |  [X] Maximum match (greedy)
|     |  [ ] Allow comments
|     |  [ ] '.' matches newline
|     |  [X] UTF-8 Support
|     |
|     +--Add left margin [£££]
|         
+--Perl pattern [###$] with []
      [X] Match case
      [ ] Whole words only
      [ ] Case sensitive replace
      [ ] Prompt on replace
      [ ] Skip prompt if identical
      [ ] First only
      [ ] Extract matches
      Maximum text buffer size 4096
      [ ] Maximum match (greedy)
      [ ] Allow comments
      [ ] '.' matches newline
      [X] UTF-8 Support
   
It works on ANSI text in the trial area.

Code: Select all

£££You can type sample text in
£££the Trial Run Input Area to
test if your filter is working
properly. Click the [Trial Run]
button below to start the test.

You can also perform Partial Trial
Runs by right-clicking on filters
£££in the Filter list.

To clear this text, just right
click it and select 'Clear Entire
£££Field' from the menu. Most
£££of TextPipe's fields have
£££similar helpful menus.
Not yet tried it with Arabic input file.

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Restrict according to line length?

Postby DFH » Fri Dec 17, 2010 10:04 pm

With UTF-8 text, the problem would be that a restrict filter based on columns would in effect be counting bytes rather than characters.

This is the real difficulty I am seeking to solve.

So still seeking suggestions.

David

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Restrict according to line length?

Postby DataMystic Support » Mon Dec 20, 2010 5:05 pm

Can you use the UTF-8 mode of perl regex to capture the multi-byte characters?
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Restrict according to line length?

Postby DFH » Wed Dec 22, 2010 4:48 am

Hello Simon,

Actually I gave up on this idea, and resorted to designing a filter to detect and mark "text styles" within the RTF file used at an earlier stage in my file preprocessing.

This was more reliable than using pattern length within a UTF-8 file as a diferentiator between two kinds of text line.

Thanks anyway.

David


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: No registered users and 3 guests