Japanese: word count

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

kurochyan
Posts: 1
Joined: Sun Oct 12, 2008 1:06 am

Japanese: word count

Postby kurochyan » Sun Oct 12, 2008 1:28 am

Hi, I really need your help. I have a problem with counting words in Japaneses text. It is easy to use MS word to count simply if the file is not big. But my data are too big, around 2Gb of them in text format. I am very new to TextPipe and feel myself like a "newbee", so I really need help. The problem is that Japanese text does not have a space separations like in English and has a sentence look like this: "日本語の文には区切りがありません”. Can anybody help me? :cry:

User avatar
DataMystic Support
Site Admin
Posts: 2136
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Japanese: word count

Postby DataMystic Support » Mon Oct 13, 2008 8:34 am

Sorry, TextPipe's word count is designed for spaces between words stored in ANSI or UTF-8.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

tahoar
Posts: 10
Joined: Tue Sep 23, 2008 10:35 am

Re: Japanese: word count

Postby tahoar » Sat Nov 22, 2008 12:45 am

Counting words requires words be identified with boundaries. or "segmented." Europoean languages use spaces. Arabic scripts change the form of the characters. East Asian languages, such as Chinese, Japanese and Korean don't have a consistent method, and Thai doesn't do it at all. Microsoft developed some rudamentary segmentation technology in MS Word for these languages, but it is very inaccurate. Even the best computational linguists are still struggling to do with a CPU what educated humans do naturally. Just search Google for "japanese word segmentation algorithm"

There is no simple solution.

User avatar
DataMystic Support
Site Admin
Posts: 2136
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Japanese: word count

Postby DataMystic Support » Mon Nov 24, 2008 3:10 pm

Whew! Thanks for getting us off the hook :-)
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: Baidu [Spider], Google [Bot] and 1 guest