Improve the Sort filter

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Improve the Sort filter

Postby DFH » Sun Jan 10, 2016 2:53 am

The Sort filter is described as:
The sort type controls the method by which items are sorted. The available options are:
· ANSI sort (case insensitive)
· ANSI sort (case sensitive) - faster than case insensitive as no case-mapping is performed
· ASCII sort (case insensitive)
· ASCII sort (case sensitive) - faster than case insensitive as no case-mapping is performed
· Numeric sort
· Sort by length of line

It doesn't support sorting of UTF-8 text.

On the other hand, I regularly use the Count Duplicate Lines filter, and find that it handles UTF-8 text quite happily, and that the output is Sorted.

So why not improve the Sort filter using the code underlying the Count Duplicate Lines filter?

Please!

David

User avatar
DataMystic Support
Site Admin
Posts: 2136
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Improve the Sort filter

Postby DataMystic Support » Tue Jan 19, 2016 4:39 pm

Hi David,

That is really strange, because they both use the same underlying list to do comparisons.

Do you have a set of test files and filters that you could share with me?

Thanks,

Simon
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Improve the Sort filter

Postby DFH » Thu Jan 21, 2016 10:12 pm

I'll get back to you with some test files.

Remind me in a week if I forget, please.

David

User avatar
DataMystic Support
Site Admin
Posts: 2136
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Improve the Sort filter

Postby DataMystic Support » Thu Jan 21, 2016 11:44 pm

Will do!
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Improve the Sort filter

Postby DFH » Fri Jan 22, 2016 3:21 am

The Sort filter (ANSI case senstive selected) and the Count Duplicate Lines filter give identical results
(once the count column has been removed)

However, neither filter is good at sorting Unicode text files.

Worse than that, the Count Duplicate Lines filter Help page doesn't inform users of its sort limitations.
At least the Sort filter shows what are the nine available options in the drop down selector.

What's really needed (IMHO) is a Sort filter that provides the following further options:
    UCA = Unicode collation algorithm
    CLDR = Common Locale Data Repository
    EOR = European Ordering Rules
In addition, it would be very useful to provide custom sort method for some scripts, such as Unicode Hebrew with accents and points.
For another slant on this in particular, see https://github.com/ninjaaron/ivsort.py

Unicode text sorts should be applicable for both UTF-8 and UTF-16LE input data.
i.e. One shouldn't have to convert UTF-8 to UTF-16 before doin the sort.

Best regards,

David


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: No registered users and 1 guest