Unexpectly removed "end of line characters" on UNICODE files

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

cutebuddy
Posts: 2
Joined: Mon Aug 24, 2009 4:29 pm

Unexpectly removed "end of line characters" on UNICODE files

Postby cutebuddy » Mon Aug 24, 2009 5:33 pm

My knowledge level about unicode-big5-EOL_characters and English skill :D is newbie.

I have got a problem when trying to convert some big5 encodeing files to gbk ones.

The file content seens like below in a Chinese winxp system notepad:
材彻笿
材彻鹤眔褐
材彻皑初
材き彻刊┬动
材せ彻ド
材彻硔
材彻疷隔硔

and please note that this block now is multi-line.

I setup a filter "Convert from BIG5 to GBK" and go run( or Trial run can also show the same meaning ). The Chinese character has been expectly converted, now it seens like:
卷九 第一章 陰癸魅影第二章 荒村奇遇第三章 因禍得福第四章 飛馬牧場第五章 膳房爭雄第六章 美人如玉第七章 後山奇逢第八章 狹路相逢

and this is ONE line..... I hope the result should be:
卷九 第一章 陰癸魅影
第二章 荒村奇遇
第三章 因禍得福
第四章 飛馬牧場
第五章 膳房爭雄
第六章 美人如玉
第七章 後山奇逢

I tried to cover some related filters like "End of line characters", ANSI to/from unicode and as a newbie, I can't get the hope relust so far.

I tried to "Analyze file" the original big5 file, it got: Encoding: ASCII or ANSI (or UTF-8 without BOM), No BOM, No end of line characters found - likely a mainframe or fixed-length record format, Unknown format. And this information may help? Why 'No end of line' and otherwise the converting result above shows that the #13#10 character has been removed? ( I use Winhex and notice the origial file contains the 0x0D0A charater. )

Another notice: In "File input" page, I have to set Binary files - Process, otherwise all the files would be skipped.

Thanks.

cutebuddy
Posts: 2
Joined: Mon Aug 24, 2009 4:29 pm

Re: Unexpectly removed "end of line characters" on UNICODE files

Postby cutebuddy » Mon Aug 24, 2009 5:34 pm

The version is:  TextPipe Pro 8.1.10 Evaluation Edition.

User avatar
DataMystic Support
Site Admin
Posts: 2136
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Unexpectly removed "end of line characters" on UNICODE files

Postby DataMystic Support » Mon Aug 24, 2009 9:33 pm

Hi there. Please upgrade to v8.3.7.

TextPipe doesn't ever remove or insert extract characters - you must tell it what to do. So I am not sure why this is happening.

If you convert from BIG5 to UTF-8, what do you see in Notepad? The same problem?
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Unexpectly removed "end of line characters" on UNICODE files

Postby DFH » Wed Sep 16, 2009 8:55 pm

If all you wish to do is to change a file encoding, then perhaps TextPipe is not the most appropriate tool.

I'm not being a "wet blanket", as I do regard TextPipe as one of the best pieces of software I have ever purchased.

Nevertheless, what you want to achieve might be as simple opening a file with a suitable Windows text editor and changing the encoding, then resaving.

[Ed: Which ignores the whole point of using TextPipe to automate text processing]

Best regards,

David

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Unexpectly removed "end of line characters" on UNICODE files

Postby DFH » Wed Sep 16, 2009 8:56 pm

As you are confessedly a Unicode newbie, this website should be of immense benefit.

Alan Wood’s Unicode Resources: http://www.alanwood.net/unicode/

David


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: Baidu [Spider] and 1 guest