LESSCHARSET = utf-8 less似乎不起作用

时间:2010-01-22 04:05:49

标签: utf-8 unix

我正在尝试在less中查看UTF-8文本文件/流,即使我像这样调用它:

cat file | LESSCHARSET=utf-8 less

非ASCII兼容的UTF-8字符无法正确显示。相反,它们的十六进制值在括号中突出显示,例如<F4>

使用UTF-8编码在vim中读取相同的文本没有任何问题。所以我在想我调用less的方式有问题。

我的locale输出如下

LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

我的较少版本是XCode在OSX Leopard上安装的版本:

$ less --version | sed 's/^/    /'
less 394
Copyright (C) 1984-2005 Mark Nudelman

less comes with NO WARRANTY, to the extent permitted by law.
For information about the terms of redistribution, 
see the file named README in the less distribution.
Homepage: http://www.greenwoodsoftware.com/less

locale -a | grep US | sed 's/^/ /'输出以下内容:

en_AU.US-ASCII
en_CA.US-ASCII
en_GB.US-ASCII
en_NZ.US-ASCII
en_US
en_US.ISO8859-1
en_US.ISO8859-15
en_US.US-ASCII
en_US.UTF-8

6 个答案:

答案 0 :(得分:8)

  1. locale命令输出了什么?它是UTF-8语言环境吗?

  2. 您确定终端设置为显示UTF-8吗? echo -e '\xe2\x82\xac'是否产生欧元(欧元)符号?

  3. 您设置的语言环境是否已安装在系统上?是吗 出现在locale -a输出的列表中?

  4. 您使用的是什么版本的less? (运行less --version查找。) 真的真的旧版本甚至不支持LESSCHARSET。这个 不太可能是这种情况,因为我有一个Debian“sarge”系统 less版本382,如果语言环境是,它甚至不需要LESSCHARSET 设置正确。

答案 1 :(得分:5)

我的猜测是你的文件不是UTF8而是ISO8859。 (&lt; F4&gt;字符应该是'ô'?)

使用LANG=en_US.ISO-8859-1 xterm启动xterm。然后验证语言环境(locale的输出应该类似于en_US.ISO-8859-1)。然后使用less来查看文件。它显示正确吗?

请注意,仅在不启动新终端的情况下使用LESSCHARSET=iso8859是不够的。 LESSCHARSET告诉我们终端可以解释iso8859,但您的终端可能会显示UTF8,因为欧元符号正确显示。但由于\ xf4不是有效的utf8字符,终端可能会显示类似“ ”的内容。

答案 2 :(得分:2)

尝试命令file file.txt。例如,如果输出为“ISO-8859英文文本”,则通过命令iconv -f ISO-8859-1 -t UTF-8 -o testfile.txt file.txt将文件的编码从ISO-8859更改为UTF-8。如果less testfile.txt显示正确,请使用mv testfile.txt file.txt完成。

答案 3 :(得分:1)

在Mac OS上,字符集必须是大写的:

bash-4.4$ less --version
less 458 (POSIX regular expressions)
Copyright (C) 1984-2012 Mark Nudelman

bash-4.4$ LESSCHARSET=cp1251 less
invalid charset name

bash-4.4$ LESSCHARSET=CP1251 less
Missing filename ("less --help" for help)

Here我找到了字符集列表:

{ "ascii",          NULL,       "8bcccbcc18b95.b" },
{ "utf-8",          &utf_mode,  "8bcccbcc18b95.b126.bb" },
{ "iso8859",        NULL,       "8bcccbcc18b95.33b." },
{ "latin3",         NULL,       "8bcccbcc18b95.33b5.b8.b15.b4.b12.b18.b12.b." },
{ "arabic",         NULL,       "8bcccbcc18b95.33b.3b.7b2.13b.3b.b26.5b19.b" },
{ "greek",          NULL,       "8bcccbcc18b95.33b4.2b4.b3.b35.b44.b" },
{ "greek2005",      NULL,       "8bcccbcc18b95.33b14.b35.b44.b" },
{ "hebrew",         NULL,       "8bcccbcc18b95.33b.b29.32b28.2b2.b" },
{ "koi8-r",         NULL,       "8bcccbcc18b95.b." },
{ "KOI8-T",         NULL,       "8bcccbcc18b95.b8.b6.b8.b.b.5b7.3b4.b4.b3.b.b.3b." },
{ "georgianps",     NULL,       "8bcccbcc18b95.3b11.4b12.2b." },
{ "tcvn",           NULL,       "b..b...bcccbccbbb7.8b95.b48.5b." },
{ "TIS-620",        NULL,       "8bcccbcc18b95.b.4b.11b7.8b." },
{ "next",           NULL,       "8bcccbcc18b95.bb125.bb" },
{ "dos",            NULL,       "8bcccbcc12bc5b95.b." },
{ "windows-1251",   NULL,       "8bcccbcc12bc5b95.b24.b." },
{ "windows-1252",   NULL,       "8bcccbcc12bc5b95.b.b11.b.2b12.b." },
{ "windows-1255",   NULL,       "8bcccbcc12bc5b95.b.b8.b.5b9.b.4b." },
{ "ebcdic",         NULL,       "5bc6bcc7bcc41b.9b7.9b5.b..8b6.10b6.b9.7b9.8b8.17b3.3b9.7b9.8b8.6b10.b.b.b." },
{ "IBM-1047",       NULL,       "4cbcbc3b9cbccbccbb4c6bcc5b3cbbc4bc4bccbc191.b" },
{ NULL, NULL, NULL }

及其别名:

{ "UTF-8",          "utf-8" },
{ "ANSI_X3.4-1968", "ascii" },
{ "US-ASCII",       "ascii" },
{ "latin1",         "iso8859" },
{ "ISO-8859-1",     "iso8859" },
{ "latin9",         "iso8859" },
{ "ISO-8859-15",    "iso8859" },
{ "latin2",         "iso8859" },
{ "ISO-8859-2",     "iso8859" },
{ "ISO-8859-3",     "latin3" },
{ "latin4",         "iso8859" },
{ "ISO-8859-4",     "iso8859" },
{ "cyrillic",       "iso8859" },
{ "ISO-8859-5",     "iso8859" },
{ "ISO-8859-6",     "arabic" },
{ "ISO-8859-7",     "greek" },
{ "IBM9005",        "greek2005" },
{ "ISO-8859-8",     "hebrew" },
{ "latin5",         "iso8859" },
{ "ISO-8859-9",     "iso8859" },
{ "latin6",         "iso8859" },
{ "ISO-8859-10",    "iso8859" },
{ "latin7",         "iso8859" },
{ "ISO-8859-13",    "iso8859" },
{ "latin8",         "iso8859" },
{ "ISO-8859-14",    "iso8859" },
{ "latin10",        "iso8859" },
{ "ISO-8859-16",    "iso8859" },
{ "IBM437",         "dos" },
{ "EBCDIC-US",      "ebcdic" },
{ "IBM1047",        "IBM-1047" },
{ "KOI8-R",         "koi8-r" },
{ "KOI8-U",         "koi8-r" },
{ "GEORGIAN-PS",    "georgianps" },
{ "TCVN5712-1",     "tcvn" },
{ "NEXTSTEP",       "next" },
{ "windows",        "windows-1252" }, /* backward compatibility */
{ "CP1251",         "windows-1251" },
{ "CP1252",         "windows-1252" },
{ "CP1255",         "windows-1255" },
{ NULL, NULL }

答案 4 :(得分:0)

为我工作:

m_func

答案 5 :(得分:0)

less -r为我正确显示了文件。

 -r or --raw-control-chars
              Causes "raw" control characters to be displayed.  The default is
              to  display  control  characters  using  the caret notation; for
              example, a control-A (octal 001) is displayed as "^A".  Warning:
              when the -r option is used, less cannot keep track of the actual
              appearance of the screen (since this depends on how  the  screen
              responds to each type of control character).  Thus, various dis-
              play problems may result, such as long lines being split in  the
              wrong place.