TextPipe: Online Help
    Convert to and from Unicode
 

Submit feedback on this topic 

 Home  User Assistance   Tutorials   How to Use TextPipe
 Menus: File   Edit   Filters[ Wizards  Convert   Unicode   Add   Remove   Replace   Extract   Special   Maps   Restrict ]  Tools   Window   Help   Advanced
Home
Up

 

 

(Pro only). This filter converts between a variety of text encodings, including various forms of Unicode, and a variety of code pages.

Note: The Trial Output Area can only display ANSI and Unicode text. It cannot understand or display other code pages. If you want to check an output file, you MUST use a program that is specifically enabled for the code page you want to view. e.g. if you convert Unicode to CP936, the Trial Output will show rubbish. You must use a CP936-enabled application to view the output file.

Convert FROM

Enter the source or input encoding here.

Both the Convert FROM and Convert TO lists contain:

  1. Inbuilt Conversions - provided by TextPipe's internal libraries
  2. Windows Installed Code Pages - provided by Windows. The code pages available depend on which code pages you have installed (see Control Panel\Regional Settings).

In some cases the conversions offered overlap. In general, the Inbuilt conversions will be faster than the Windows conversions, but in some cases they may not be as up-to-date, particularly for South-East Asian languages.

Convert TO

Enter the destination or output encoding here.

If a character cannot be converted to the destination format, the default character for the encoding (or a space) is output instead. This maintains the column spacing for reports.

If the target Unicode format specifies a byte order (e.g. UTF-16LE, UTF016BE), then no Byte Order Mark will be output. If you require a Byte Order Mark, follow this filter with an Add Header filter.

Swap

This swaps the Convert FROM and Convert TO settings.

Error Character

This character is output when there is no suitable character in the destination encoding. It defaults to a space.

Unicode Encodings

To find out the encoding of a file, drag it to TextPipe's File Grid, right-click the file and choose Analyze File.

For most purposes, Unicode has three main encodings:

  • UTF-8 - most common for web pages
  • UTF-16LE (little endian) - most common on Windows
  • UTF-16BE (big endian) - less common on Windows.

Note: Unicode itself can be stored in a variety of formats, including UTF-* and UCS-*, with the most common being UTF-16 and UTF-8.

Trial Run Area

For Unicode Input from the Trial Run area, set Convert FROM to 'UTF16-LE', and on the Trial Run Area tab, check Treat Trial Run Input as Unicode (UTF16-LE) instead of ANSI.

For Unicode Output to the Trial Run area, set Convert TO to 'UTF16-LE', and on the Trial Run Area tab, check Treat Trial Run Output as Unicode (UTF16-LE) instead of ANSI.

If your input or output is UTF-8, ensure the checkbox is unchecked.

Conversions offered

See also: Code page conversions offered

ARMSCII-8
ASCII
ANSI
BIG5
BIG5HKSCS
C99
CES-BIG5
CES-GBK
CNS-11643-1
CNS-11643-15
CNS-11643-2
CNS-11643-3
CNS-11643-4
CNS-11643-5
CNS-11643-6
CNS-11643-7
CNS-11643-INV
CP1046
CP1124
CP1125
CP1129
CP1133
CP1161
CP1162
CP1163
CP1250
CP1251
CP1252
CP1253
CP1254
CP1255
CP1256
CP1257
CP1258
CP437
CP737
CP775
CP850
CP852
CP853
CP855
CP856
CP857
CP858
CP860
CP861
CP862
CP863
CP864
CP865
CP866
CP869
CP874
CP922
CP932
CP949
CP950
DEC-Hanyu
DEC-Kanji
EUC-CN
EUC-jisx0213
EUC-JP
EUC-KR
EUC-TW
GB18030
GB2312
GBK
GEORGIAN-ACADEMY
GEORGIAN-PS
HP-ROMAN8
HZ
ISO-2022-CN
ISO-2022-CN-EXT
ISO-2022-JP
ISO-2022-JP1
ISO-2022-JP2
ISO-2022-JP3
ISO-2022-KR
ISO646-CN
ISO-646-JP
ISO-8859-1
ISO-8859-10
ISO-8859-13
ISO-8859-14
ISO-8859-15
ISO-8859-16
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-6
ISO-8859-7
ISO-8859-8
ISO-8859-9
ISO-IR-165
JAVA
JIS_X0201
JIS_X0208
JIS_X0212
JOHAB
Johab-Hangul
KOI8-R
KOI8-RU
KOI8-T
KOI8-U
KSC-5601
MacArabic
MacCentralEurope
MacCroatian
MacCyrillic
MacGreek
MacHebrew
MacIceland
MacRoman
MacRomania
MacThai
MacTurkish
MacUkraine
MULELAO-1
NEXTSTEP
RISCOS1
SHIFT-JISX0213
SJIS
TCVN
TDS565
TIS-620
UCS-2
UCS-2BE
UCS-2-Internal
UCS-2LE
UCS-2-Swapped
UCS-4
UCS-4BE
UCS-4-Internal
UCS-4LE
UCS-4-Swapped
UTF-16
UTF-16BE
UTF-16LE
UTF-32
UTF-32BE
UTF-32LE
UTF-7
UTF-8
VISCII

Note:

  • UCS-4 is UTF-32 with support for code points beyond U+10FFFF (which are supposed to be unassignable forever).
  • UCS-2 is UTF-16 with surrogate support removed (so code points beyond U+FFFF cannot be represented).

Equivalent encodings

The following list gives equivalent encodings (different names for the same encoding), one per line.

US-ASCII,ASCII,ISO646-US,ISO_646.IRV:1991,ISO-IR-6,ANSI_X3.4-1968,ANSI_X3.4-1986,CP367,IBM367,US,csASCII
UCS-2,ISO-10646-UCS-2,csUnicode
UCS-2BE,UNICODEBIG,UNICODE-1-1,csUnicode11
UCS-2LE,UNICODELITTLE
UCS-4,ISO-10646-UCS-4,csUCS4
UTF-7,UNICODE-1-1-UTF-7,csUnicode11UTF7
ISO-8859-1,ISO_8859-1,ISO_8859-1:1987,ISO-IR-100,CP819,IBM819,LATIN1,L1,csISOLatin1,ISO8859-1
ISO-8859-2,ISO_8859-2,ISO_8859-2:1987,ISO-IR-101,LATIN2,L2,csISOLatin2,ISO8859-2
ISO-8859-3,ISO_8859-3,ISO_8859-3:1988,ISO-IR-109,LATIN3,L3,csISOLatin3,ISO8859-3
ISO-8859-4,ISO_8859-4,ISO_8859-4:1988,ISO-IR-110,LATIN4,L4,csISOLatin4,ISO8859-4
ISO-8859-5,ISO_8859-5,ISO_8859-5:1988,ISO-IR-144,CYRILLIC,csISOLatinCyrillic,ISO8859-5
ISO-8859-6,ISO_8859-6,ISO_8859-6:1987,ISO-IR-127,ECMA-114,ASMO-708,ARABIC,csISOLatinArabic,ISO8859-6
ISO-8859-7,ISO_8859-7,ISO_8859-7:1987,ISO-IR-126,ECMA-118,ELOT_928,GREEK8,GREEK,csISOLatinGreek,ISO8859-7
ISO-8859-8,ISO_8859-8,ISO_8859-8:1988,ISO-IR-138,HEBREW,csISOLatinHebrew,ISO8859-8
ISO-8859-9,ISO_8859-9,ISO_8859-9:1989,ISO-IR-148,LATIN5,L5,csISOLatin5,ISO8859-9
ISO-8859-10,ISO_8859-10,ISO_8859-10:1992,ISO-IR-157,LATIN6,L6,csISOLatin6,ISO8859-10
ISO-8859-13,ISO_8859-13,ISO-IR-179,LATIN7,L7,ISO8859-13
ISO-8859-14,ISO_8859-14,ISO_8859-14:1998,ISO-IR-199,LATIN8,L8,ISO-CELTIC,ISO8859-14
ISO-8859-15,ISO_8859-15,ISO_8859-15:1998,ISO-IR-203,ISO8859-15
ISO-8859-16,ISO_8859-16,ISO_8859-16:2000,ISO-IR-226,ISO8859-16
KOI8-R,csKOI8R
CP1250,WINDOWS-1250,MS-EE
CP1251,WINDOWS-1251,MS-CYRL
CP1252,WINDOWS-1252,MS-ANSI
CP1253,WINDOWS-1253,MS-GREEK
CP1254,WINDOWS-1254,MS-TURK
CP1255,WINDOWS-1255,MS-HEBR
CP1256,WINDOWS-1256,MS-ARAB
CP1257,WINDOWS-1257,WINBALTRIM
CP1258,WINDOWS-1258
CP850,IBM850,850,csPC850Multilingual
CP862,IBM862,862,csPC862LatinHebrew
CP866,IBM866,866,csIBM866
MACINTOSH,MAC,csMacintosh
HP-ROMAN8,ROMAN8,R8,csHPRoman8
CP1133,IBM-CP1133
TIS-620,TIS620,TIS620-0,TIS620.2529-1,TIS620.2533-0,TIS620.2533-1,ISO-IR-166
CP874,WINDOWS-874
VISCII,VISCII1.1-1,csVISCII
TCVN,TCVN-5712,TCVN5712-1,TCVN5712-1:1993
JIS_C6220-1969-RO,ISO646-JP,ISO-IR-14,JP,csISO14JISC6220ro
JIS_X0201,JISX0201-1976,X0201,csHalfWidthKatakana
JIS_X0208,JIS_X0208-1983,JIS_X0208-1990,JIS0208,X0208,ISO-IR-87,JIS_C6226-1983,csISO87JISX0208
JIS_X0212,JIS_X0212.1990-0,JIS_X0212-1990,X0212,ISO-IR-159,csISO159JISX02121990
GB_1988-80,ISO646-CN,ISO-IR-57,CN,csISO57GB1988
GB_2312-80,ISO-IR-58,csISO58GB231280,CHINESE
ISO-IR-165,CN-GB-ISOIR165
KSC_5601,KS_C_5601-1987,KS_C_5601-1989,ISO-IR-149,csKSC56011987,KOREAN
EUC-JP,EUCJP,Extended_UNIX_Code_Packed_Format_for_Japanese,csEUCPkdFmtJapanese
SHIFT_JIS,SHIFT-JIS,SJIS,MS_KANJI,csShiftJIS
ISO-2022-JP,csISO2022JP
ISO-2022-JP-2,csISO2022JP2
EUC-CN,EUCCN,GB2312,CN-GB,csGB2312
GBK,CP936
ISO-2022-CN,csISO2022CN
HZ,HZ-GB-2312
EUC-TW,EUCTW,csEUCTW
BIG5,BIG-5,BIG-FIVE,BIGFIVE,CN-BIG5,csBig5
BIG5-HKSCS,BIG5HKSCS
EUC-KR,EUCKR,csEUCKR
CP949,UHC
JOHAB,CP1361
ISO-2022-KR,csISO2022KR

See also

Swap UTF-16 word order
Swap UTF-32 word order
Make Big Endian
Make Little Endian
Unicode conversion
Code pages

Some of these conversions are provided by the libiconv library. The remainder are provided by Windows.

 

 

 Contact Us   Support   Community   Tutorials and User Guides (online)
 Copyright © 1999-2006 DataMystic. All rights reserved.