|
| |

(Pro only). This filter
converts between a variety of text encodings, including various forms of
Unicode, and a variety of code pages.
Note: The Trial Output Area can only display ANSI and Unicode text. It
cannot understand or display other code pages. If you want to check an
output file, you MUST use a program that is specifically enabled for the code
page you want to view. e.g. if you convert Unicode to CP936, the Trial Output
will show rubbish. You must use a CP936-enabled application to view the output
file.
Convert FROM
Enter the source or input encoding here.
Both the Convert FROM and Convert TO lists contain:
- Inbuilt Conversions - provided by TextPipe's internal libraries
- Windows Installed Code Pages - provided by Windows. The code pages
available depend on which code pages you have installed (see Control
Panel\Regional Settings).
In some cases the conversions offered overlap. In general, the Inbuilt
conversions will be faster than the Windows conversions, but in some cases they
may not be as up-to-date, particularly for South-East Asian languages.
Convert TO
Enter the destination or output encoding here.
If a character cannot be converted to the destination format, the default
character for the encoding (or a space) is output instead. This maintains the
column spacing for reports.
If the target Unicode format specifies a byte order (e.g. UTF-16LE,
UTF016BE), then no Byte Order Mark will be output. If you require a Byte Order
Mark, follow this filter with an
Add Header filter.
Swap
This swaps the Convert FROM and Convert TO settings.
Error Character
This character is output when there is no suitable character in the
destination encoding. It defaults to a space.
Unicode Encodings
To find out the encoding of a file, drag it to TextPipe's
File Grid, right-click the
file and choose Analyze File.
For most purposes, Unicode has three main encodings:
- UTF-8 - most common for web pages
- UTF-16LE (little endian) - most common on Windows
- UTF-16BE (big endian) - less common on Windows.
Note: Unicode itself can be stored in a variety of formats, including UTF-* and UCS-*,
with the most common being UTF-16 and UTF-8.
Trial Run Area
For Unicode Input from the Trial Run area, set Convert FROM to
'UTF16-LE', and on the Trial Run Area tab, check Treat Trial Run Input as
Unicode (UTF16-LE) instead of ANSI.
For Unicode Output to the Trial Run area, set Convert TO to
'UTF16-LE', and on the Trial Run Area tab, check Treat Trial Run Output as
Unicode (UTF16-LE) instead of ANSI.
If your input or output is UTF-8, ensure the checkbox is unchecked.
Conversions offered
See also: Code page conversions offered
ARMSCII-8
ASCII
ANSI
BIG5
BIG5HKSCS
C99
CES-BIG5
CES-GBK
CNS-11643-1
CNS-11643-15
CNS-11643-2
CNS-11643-3
CNS-11643-4
CNS-11643-5
CNS-11643-6
CNS-11643-7
CNS-11643-INV
CP1046
CP1124
CP1125
CP1129
CP1133
CP1161
CP1162
CP1163
CP1250
CP1251
CP1252
CP1253
CP1254
CP1255
CP1256
CP1257
CP1258
CP437
CP737
CP775 |
CP850
CP852
CP853
CP855
CP856
CP857
CP858
CP860
CP861
CP862
CP863
CP864
CP865
CP866
CP869
CP874
CP922
CP932
CP949
CP950
DEC-Hanyu
DEC-Kanji
EUC-CN
EUC-jisx0213
EUC-JP
EUC-KR
EUC-TW
GB18030
GB2312
GBK
GEORGIAN-ACADEMY
GEORGIAN-PS
HP-ROMAN8
HZ
ISO-2022-CN
ISO-2022-CN-EXT |
ISO-2022-JP
ISO-2022-JP1
ISO-2022-JP2
ISO-2022-JP3
ISO-2022-KR
ISO646-CN
ISO-646-JP
ISO-8859-1
ISO-8859-10
ISO-8859-13
ISO-8859-14
ISO-8859-15
ISO-8859-16
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-6
ISO-8859-7
ISO-8859-8
ISO-8859-9
ISO-IR-165
JAVA
JIS_X0201
JIS_X0208
JIS_X0212
JOHAB
Johab-Hangul
KOI8-R
KOI8-RU
KOI8-T
KOI8-U
KSC-5601
MacArabic
MacCentralEurope
MacCroatian |
MacCyrillic
MacGreek
MacHebrew
MacIceland
MacRoman
MacRomania
MacThai
MacTurkish
MacUkraine
MULELAO-1
NEXTSTEP
RISCOS1
SHIFT-JISX0213
SJIS
TCVN
TDS565
TIS-620
UCS-2
UCS-2BE
UCS-2-Internal
UCS-2LE
UCS-2-Swapped
UCS-4
UCS-4BE
UCS-4-Internal
UCS-4LE
UCS-4-Swapped
UTF-16
UTF-16BE
UTF-16LE
UTF-32
UTF-32BE
UTF-32LE
UTF-7
UTF-8
VISCII |
Note:
- UCS-4 is UTF-32 with support for code points beyond U+10FFFF (which are
supposed to be unassignable forever).
- UCS-2 is UTF-16 with surrogate support removed (so code points beyond
U+FFFF cannot be represented).
Equivalent encodings
The following list gives equivalent encodings (different names for the same
encoding), one per line.
US-ASCII,ASCII,ISO646-US,ISO_646.IRV:1991,ISO-IR-6,ANSI_X3.4-1968,ANSI_X3.4-1986,CP367,IBM367,US,csASCII
UCS-2,ISO-10646-UCS-2,csUnicode
UCS-2BE,UNICODEBIG,UNICODE-1-1,csUnicode11
UCS-2LE,UNICODELITTLE
UCS-4,ISO-10646-UCS-4,csUCS4
UTF-7,UNICODE-1-1-UTF-7,csUnicode11UTF7
ISO-8859-1,ISO_8859-1,ISO_8859-1:1987,ISO-IR-100,CP819,IBM819,LATIN1,L1,csISOLatin1,ISO8859-1
ISO-8859-2,ISO_8859-2,ISO_8859-2:1987,ISO-IR-101,LATIN2,L2,csISOLatin2,ISO8859-2
ISO-8859-3,ISO_8859-3,ISO_8859-3:1988,ISO-IR-109,LATIN3,L3,csISOLatin3,ISO8859-3
ISO-8859-4,ISO_8859-4,ISO_8859-4:1988,ISO-IR-110,LATIN4,L4,csISOLatin4,ISO8859-4
ISO-8859-5,ISO_8859-5,ISO_8859-5:1988,ISO-IR-144,CYRILLIC,csISOLatinCyrillic,ISO8859-5
ISO-8859-6,ISO_8859-6,ISO_8859-6:1987,ISO-IR-127,ECMA-114,ASMO-708,ARABIC,csISOLatinArabic,ISO8859-6
ISO-8859-7,ISO_8859-7,ISO_8859-7:1987,ISO-IR-126,ECMA-118,ELOT_928,GREEK8,GREEK,csISOLatinGreek,ISO8859-7
ISO-8859-8,ISO_8859-8,ISO_8859-8:1988,ISO-IR-138,HEBREW,csISOLatinHebrew,ISO8859-8
ISO-8859-9,ISO_8859-9,ISO_8859-9:1989,ISO-IR-148,LATIN5,L5,csISOLatin5,ISO8859-9
ISO-8859-10,ISO_8859-10,ISO_8859-10:1992,ISO-IR-157,LATIN6,L6,csISOLatin6,ISO8859-10
ISO-8859-13,ISO_8859-13,ISO-IR-179,LATIN7,L7,ISO8859-13
ISO-8859-14,ISO_8859-14,ISO_8859-14:1998,ISO-IR-199,LATIN8,L8,ISO-CELTIC,ISO8859-14
ISO-8859-15,ISO_8859-15,ISO_8859-15:1998,ISO-IR-203,ISO8859-15
ISO-8859-16,ISO_8859-16,ISO_8859-16:2000,ISO-IR-226,ISO8859-16
KOI8-R,csKOI8R
CP1250,WINDOWS-1250,MS-EE
CP1251,WINDOWS-1251,MS-CYRL
CP1252,WINDOWS-1252,MS-ANSI
CP1253,WINDOWS-1253,MS-GREEK
CP1254,WINDOWS-1254,MS-TURK
CP1255,WINDOWS-1255,MS-HEBR
CP1256,WINDOWS-1256,MS-ARAB
CP1257,WINDOWS-1257,WINBALTRIM
CP1258,WINDOWS-1258
CP850,IBM850,850,csPC850Multilingual
CP862,IBM862,862,csPC862LatinHebrew
CP866,IBM866,866,csIBM866
MACINTOSH,MAC,csMacintosh
HP-ROMAN8,ROMAN8,R8,csHPRoman8
CP1133,IBM-CP1133
TIS-620,TIS620,TIS620-0,TIS620.2529-1,TIS620.2533-0,TIS620.2533-1,ISO-IR-166
CP874,WINDOWS-874
VISCII,VISCII1.1-1,csVISCII
TCVN,TCVN-5712,TCVN5712-1,TCVN5712-1:1993
JIS_C6220-1969-RO,ISO646-JP,ISO-IR-14,JP,csISO14JISC6220ro
JIS_X0201,JISX0201-1976,X0201,csHalfWidthKatakana
JIS_X0208,JIS_X0208-1983,JIS_X0208-1990,JIS0208,X0208,ISO-IR-87,JIS_C6226-1983,csISO87JISX0208
JIS_X0212,JIS_X0212.1990-0,JIS_X0212-1990,X0212,ISO-IR-159,csISO159JISX02121990
GB_1988-80,ISO646-CN,ISO-IR-57,CN,csISO57GB1988
GB_2312-80,ISO-IR-58,csISO58GB231280,CHINESE
ISO-IR-165,CN-GB-ISOIR165
KSC_5601,KS_C_5601-1987,KS_C_5601-1989,ISO-IR-149,csKSC56011987,KOREAN
EUC-JP,EUCJP,Extended_UNIX_Code_Packed_Format_for_Japanese,csEUCPkdFmtJapanese
SHIFT_JIS,SHIFT-JIS,SJIS,MS_KANJI,csShiftJIS
ISO-2022-JP,csISO2022JP
ISO-2022-JP-2,csISO2022JP2
EUC-CN,EUCCN,GB2312,CN-GB,csGB2312
GBK,CP936
ISO-2022-CN,csISO2022CN
HZ,HZ-GB-2312
EUC-TW,EUCTW,csEUCTW
BIG5,BIG-5,BIG-FIVE,BIGFIVE,CN-BIG5,csBig5
BIG5-HKSCS,BIG5HKSCS
EUC-KR,EUCKR,csEUCKR
CP949,UHC
JOHAB,CP1361
ISO-2022-KR,csISO2022KR
See also
Swap UTF-16 word order
Swap UTF-32 word order
Make Big Endian
Make Little Endian
Unicode conversion
Code pages
Some of these conversions are provided by the libiconv library.
The remainder are provided by Windows.
|