Texpipe and Unicode (16LE) files

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

niccolo
Posts: 8
Joined: Mon Jul 17, 2006 3:20 pm

Texpipe and Unicode (16LE) files

Postby niccolo » Tue Jul 18, 2006 4:24 am

I've heard a lot about Textpipe and decided to try it. Download 7.63 t&b and try to do simple things with Unicode files and can't. It seems it doesn't understand it completely. I tried to remove trailing spaces - nothing. Trying to do that with \t+\n (these are mostly tabs) and nothing again. All other perl pattern doesn't work here with but worked without any problen in Uedit and Emeditor. Why so? Do not propose to convert files to ANSI cause files contains symbols from 3 symbol sets - non standart western, cyrillic, greek.
The help when it talks about work with Unicode files is worse than very bad.
May be necessary to do Unicodepipe?

User avatar
DataMystic Support
Site Admin
Posts: 2164
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Postby DataMystic Support » Wed Jul 19, 2006 10:08 am

Hi there,

TextPipe has specific filters to deal with Unicode (UTF16LE) data, such as the Unicode search/replace and Unicode pattern filters. For backward compatability, the original ANSI/ASCII based filters have not been modified.

So, if you'd like to use the Remove Trailing Spaces filter (which is ASCII), first convert the file to UTF-8, apply the filters, then convert it back.

The initial conversion to UTF-8 is the key here. TextPipe is used for a lot of mainframe data files, so converting EBCDIC to Unicode for internal processing is not an option until the Mainframe record structure has been unravelled.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

niccolo
Posts: 8
Joined: Mon Jul 17, 2006 3:20 pm

see no progress for multilanguage files in 8

Postby niccolo » Wed Dec 12, 2007 8:51 am

I have downloaded trial version of 8 Textpipe

Task
Need to create sorted wordlist from UTF8 (I've taken into account Your previous recommendations) txt file containg German and russian words (Umlauts and cyrillic).

Use extract matches \w+
Sort ANSI

and what

In trial output everything seems OK but resulting file have unknown encoding.
Opening it as ANSI makes russian text completely unreadable. Open it as UTF8 shows that all cyrillic words are damaged and can't be used.

What's a hell??? Who is wrong here - I or a program.

User avatar
DataMystic Support
Site Admin
Posts: 2164
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Postby DataMystic Support » Wed Dec 12, 2007 3:10 pm

You may need to add a new UTF-8 BOM to the resulting file - use
Filters\Add\File Header
with text of

Code: Select all

\xEF\xBB\xBF
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

niccolo
Posts: 8
Joined: Mon Jul 17, 2006 3:20 pm

Postby niccolo » Wed Dec 12, 2007 4:32 pm

the problem is not an unknown encoding that BOM solves. The problem is corrupted cyrillic text in file. What to do with that?

User avatar
DataMystic Support
Site Admin
Posts: 2164
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Postby DataMystic Support » Wed Dec 12, 2007 7:43 pm

No, the problem may be that sorting moves the line with the BOM further into the file, hence a new BOM is required.

Anyway, please email us your filter and a sample file.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 654
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Standard or Pro ?

Postby DFH » Wed Dec 12, 2007 9:20 pm

Is niccolo using TextPipe Standard or TextPipe Professional?

For the task in hand does it matter which ?

niccolo
Posts: 8
Joined: Mon Jul 17, 2006 3:20 pm

Postby niccolo » Thu Dec 13, 2007 3:41 am

DFH - Textpipe pro trial 8

Here the sample, filters used (1st with sorting 2nd simple wordlist creating) and results. In both results files cyrillic word are corrupted but everything is ok in trial run windows. It's not a BOM problem

http://rapidshare.com/files/76100564/pack.zip.html

I've solved this problem with other software but what's a hell when decide to try Textpipe there are always problem with this. When the native unicode support will be implemented with regexes etc?

niccolo
Posts: 8
Joined: Mon Jul 17, 2006 3:20 pm

Postby niccolo » Fri Dec 14, 2007 2:36 am

Just now found that's not so good with trial run area - all german words loose umlauts.

So may for English Textpipe is a good tool but for multilanguage files it should be taken with care.

User avatar
DataMystic Support
Site Admin
Posts: 2164
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Postby DataMystic Support » Fri Dec 14, 2007 5:37 am

No - if you read the help, the trial run area handles either ANSI or Unicode UTF-16 text (check the box).

If you use any other format you will loose data.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 654
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

I have been using TextPipe to process lots of UTF-8 files

Postby DFH » Fri Dec 14, 2007 8:11 am

I have been using TextPipe Standard to process lots of UTF-8 files, all with success, including many with non-Latin characters, such as Cyrillic, Chinese, Thai, Amharic, Japanese, Hebrew.

Only the trial area has those restrictions, just as Simon already explained.

niccolo
Posts: 8
Joined: Mon Jul 17, 2006 3:20 pm

Postby niccolo » Fri Dec 14, 2007 8:50 pm

DFH - If You have everything OK may be You can explain where I'm wrong in my example?

And regarding textpipe - in regex line I can insert sybbols that is not in system locale encoding. But in filter list such symbols look corrupted. When this problem will be solved?

DFH
Posts: 654
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Don't have a rapidshare account

Postby DFH » Sat Dec 15, 2007 1:46 am

The link you posted took me to a page wanting me to pay for an account. Please make it easier for other members to help you.

niccolo
Posts: 8
Joined: Mon Jul 17, 2006 3:20 pm

Postby niccolo » Sat Dec 15, 2007 2:21 am

DFH - if You don't use proxy there should be no problem with getting file.

Copy link into browser and press enter. In the opened screen press FREE.
Then appears another window where You are asked to enter code on a small picture (No premium Please enter). Type it in box below and press Download via ....... button.

DFH
Posts: 654
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Downloaded it now, thanks !

Postby DFH » Sat Dec 15, 2007 4:59 am

I didn't see the buttons before - thanks for help.


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: No registered users and 8 guests