在Cygwin Shell中设置字符编码读取多个字符集

时间:2016-09-26 19:03:57

标签: linux bash encoding character-encoding cygwin

我有一个包含各种语言的文件,包括ASCII和本机字符。我希望我的shell能够处理任何语言 - 英语,阿拉伯语,中文,日语等。

我在'国际化'上阅读了cygwin页面。和支持的字符集列表(如下)。另外,我已阅读有关奇怪字符的文档:https://cygwin.com/faq-nochunks.html#faq.using.weirdchars

    Charset               Codepage
-------------------   -------------------------------------------
ASCII                 20127 (US_ASCII)

CP437                   437 (OEM United States)
CP720                   720 (DOS Arabic)
CP737                   737 (OEM Greek)
CP775                   775 (OEM Baltic)
CP850                   850 (OEM Latin 1, Western European)
CP852                   852 (OEM Latin 2, Central European)
CP855                   855 (OEM Cyrillic)
CP857                   857 (OEM Turkish)
CP858                   858 (OEM Latin 1 + Euro Symbol)
CP862                   862 (OEM Hebrew)
CP866                   866 (OEM Russian)
CP874                   874 (ANSI/OEM Thai)
CP932           932 (Shift_JIS, not exactly identical to SJIS)
CP1125                 1125 (OEM Ukraine)
CP1250                 1250 (ANSI Central European)
CP1251                 1251 (ANSI Cyrillic)
CP1252                 1252 (ANSI Latin 1, Western European)
CP1253                 1253 (ANSI Greek)
CP1254                 1254 (ANSI Turkish)
CP1255                 1255 (ANSI Hebrew)
CP1256                 1256 (ANSI Arabic)
CP1257                 1257 (ANSI Baltic)
CP1258                 1258 (ANSI/OEM Vietnamese)

ISO-8859-1            28591 (ISO-8859-1)
ISO-8859-2            28592 (ISO-8859-2)
ISO-8859-3            28593 (ISO-8859-3)
ISO-8859-4            28594 (ISO-8859-4)
ISO-8859-5            28595 (ISO-8859-5)
ISO-8859-6            28596 (ISO-8859-6)
ISO-8859-7            28597 (ISO-8859-7)
ISO-8859-8            28598 (ISO-8859-8)
ISO-8859-9            28599 (ISO-8859-9)
ISO-8859-10             -   (not available)
ISO-8859-11             -   (not available)
ISO-8859-13           28603 (ISO-8859-13)
ISO-8859-14             -   (not available)
ISO-8859-15           28605 (ISO-8859-15)
ISO-8859-16             -   (not available)

Big5                    950 (ANSI/OEM Traditional Chinese)
EUCCN or euc-CN         936 (ANSI/OEM Simplified Chinese)
EUCJP or euc-JP       20932 (EUC Japanese)
EUCKR or euc-KR         949 (EUC Korean)
GB2312                  936 (ANSI/OEM Simplified Chinese)
GBK                     936 (ANSI/OEM Simplified Chinese)
GEORGIAN-PS             -   (not available)
KOI8-R                20866 (KOI8-R Russian Cyrillic)
KOI8-U                21866 (KOI8-U Ukrainian Cyrillic)
PT154                   -   (not available)
SJIS                    -   (not available, almost, but not exactly CP932)
TIS620 or TIS-620       874 (ANSI/OEM Thai)

UTF-8 or utf8         65001 (UTF-8)

我的主要问题:是否可以让cygwin shell一次读取多种语言?我还没有真正找到这个。任何方向都受到高度赞赏。

1 个答案:

答案 0 :(得分:0)

你到底是什么意思?

最近在现代Windows(Windows 10)中使用了Cygwin,我可以让Cygwin显示各种各样的字符。例如

$ env LANG=ru_RU.UTF-8 cp --help
$ env LANG=zh_CN.UTF-8 cp --help
$ env LANG=ja_JP.UTF-8 cp --help

将显示俄文,中文,日文等。

如果这不起作用,您也可以在Windows Powershell中执行此操作,尽管需要额外的iconv步骤来对输出进行后处理:

PS C:\cygwin\bin> .\env.exe LANG=zh_CN.UTF-8 .\cp.exe --help | .\iconv.exe -f UTF-8 -t UTF-16