我正在尝试在less
中查看UTF-8文本文件/流,即使我像这样调用它:
cat file | LESSCHARSET=utf-8 less
非ASCII兼容的UTF-8字符无法正确显示。相反,它们的十六进制值在括号中突出显示,例如<F4>
。
使用UTF-8编码在vim中读取相同的文本没有任何问题。所以我在想我调用less
的方式有问题。
我的locale
输出如下
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
我的较少版本是XCode在OSX Leopard上安装的版本:
$ less --version | sed 's/^/ /'
less 394
Copyright (C) 1984-2005 Mark Nudelman
less comes with NO WARRANTY, to the extent permitted by law.
For information about the terms of redistribution,
see the file named README in the less distribution.
Homepage: http://www.greenwoodsoftware.com/less
locale -a | grep US | sed 's/^/ /'
输出以下内容:
en_AU.US-ASCII
en_CA.US-ASCII
en_GB.US-ASCII
en_NZ.US-ASCII
en_US
en_US.ISO8859-1
en_US.ISO8859-15
en_US.US-ASCII
en_US.UTF-8
答案 0 :(得分:8)
locale
命令输出了什么?它是UTF-8语言环境吗?
您确定终端设置为显示UTF-8吗?
echo -e '\xe2\x82\xac'
是否产生欧元(欧元)符号?
您设置的语言环境是否已安装在系统上?是吗
出现在locale -a
输出的列表中?
您使用的是什么版本的less
? (运行less --version
查找。)
真的,真的旧版本甚至不支持LESSCHARSET
。这个
不太可能是这种情况,因为我有一个Debian“sarge”系统
less
版本382,如果语言环境是,它甚至不需要LESSCHARSET
设置正确。
答案 1 :(得分:5)
我的猜测是你的文件不是UTF8而是ISO8859。 (&lt; F4&gt;字符应该是'ô'?)
使用LANG=en_US.ISO-8859-1 xterm
启动xterm。然后验证语言环境(locale
的输出应该类似于en_US.ISO-8859-1)。然后使用less来查看文件。它显示正确吗?
请注意,仅在不启动新终端的情况下使用LESSCHARSET=iso8859
是不够的。 LESSCHARSET
告诉我们终端可以解释iso8859,但您的终端可能会显示UTF8,因为欧元符号正确显示。但由于\ xf4不是有效的utf8字符,终端可能会显示类似“ ”的内容。
答案 2 :(得分:2)
尝试命令file file.txt
。例如,如果输出为“ISO-8859英文文本”,则通过命令iconv -f ISO-8859-1 -t UTF-8 -o testfile.txt file.txt
将文件的编码从ISO-8859更改为UTF-8。如果less testfile.txt
显示正确,请使用mv testfile.txt file.txt
完成。
答案 3 :(得分:1)
在Mac OS上,字符集必须是大写的:
bash-4.4$ less --version
less 458 (POSIX regular expressions)
Copyright (C) 1984-2012 Mark Nudelman
bash-4.4$ LESSCHARSET=cp1251 less
invalid charset name
bash-4.4$ LESSCHARSET=CP1251 less
Missing filename ("less --help" for help)
Here我找到了字符集列表:
{ "ascii", NULL, "8bcccbcc18b95.b" },
{ "utf-8", &utf_mode, "8bcccbcc18b95.b126.bb" },
{ "iso8859", NULL, "8bcccbcc18b95.33b." },
{ "latin3", NULL, "8bcccbcc18b95.33b5.b8.b15.b4.b12.b18.b12.b." },
{ "arabic", NULL, "8bcccbcc18b95.33b.3b.7b2.13b.3b.b26.5b19.b" },
{ "greek", NULL, "8bcccbcc18b95.33b4.2b4.b3.b35.b44.b" },
{ "greek2005", NULL, "8bcccbcc18b95.33b14.b35.b44.b" },
{ "hebrew", NULL, "8bcccbcc18b95.33b.b29.32b28.2b2.b" },
{ "koi8-r", NULL, "8bcccbcc18b95.b." },
{ "KOI8-T", NULL, "8bcccbcc18b95.b8.b6.b8.b.b.5b7.3b4.b4.b3.b.b.3b." },
{ "georgianps", NULL, "8bcccbcc18b95.3b11.4b12.2b." },
{ "tcvn", NULL, "b..b...bcccbccbbb7.8b95.b48.5b." },
{ "TIS-620", NULL, "8bcccbcc18b95.b.4b.11b7.8b." },
{ "next", NULL, "8bcccbcc18b95.bb125.bb" },
{ "dos", NULL, "8bcccbcc12bc5b95.b." },
{ "windows-1251", NULL, "8bcccbcc12bc5b95.b24.b." },
{ "windows-1252", NULL, "8bcccbcc12bc5b95.b.b11.b.2b12.b." },
{ "windows-1255", NULL, "8bcccbcc12bc5b95.b.b8.b.5b9.b.4b." },
{ "ebcdic", NULL, "5bc6bcc7bcc41b.9b7.9b5.b..8b6.10b6.b9.7b9.8b8.17b3.3b9.7b9.8b8.6b10.b.b.b." },
{ "IBM-1047", NULL, "4cbcbc3b9cbccbccbb4c6bcc5b3cbbc4bc4bccbc191.b" },
{ NULL, NULL, NULL }
及其别名:
{ "UTF-8", "utf-8" },
{ "ANSI_X3.4-1968", "ascii" },
{ "US-ASCII", "ascii" },
{ "latin1", "iso8859" },
{ "ISO-8859-1", "iso8859" },
{ "latin9", "iso8859" },
{ "ISO-8859-15", "iso8859" },
{ "latin2", "iso8859" },
{ "ISO-8859-2", "iso8859" },
{ "ISO-8859-3", "latin3" },
{ "latin4", "iso8859" },
{ "ISO-8859-4", "iso8859" },
{ "cyrillic", "iso8859" },
{ "ISO-8859-5", "iso8859" },
{ "ISO-8859-6", "arabic" },
{ "ISO-8859-7", "greek" },
{ "IBM9005", "greek2005" },
{ "ISO-8859-8", "hebrew" },
{ "latin5", "iso8859" },
{ "ISO-8859-9", "iso8859" },
{ "latin6", "iso8859" },
{ "ISO-8859-10", "iso8859" },
{ "latin7", "iso8859" },
{ "ISO-8859-13", "iso8859" },
{ "latin8", "iso8859" },
{ "ISO-8859-14", "iso8859" },
{ "latin10", "iso8859" },
{ "ISO-8859-16", "iso8859" },
{ "IBM437", "dos" },
{ "EBCDIC-US", "ebcdic" },
{ "IBM1047", "IBM-1047" },
{ "KOI8-R", "koi8-r" },
{ "KOI8-U", "koi8-r" },
{ "GEORGIAN-PS", "georgianps" },
{ "TCVN5712-1", "tcvn" },
{ "NEXTSTEP", "next" },
{ "windows", "windows-1252" }, /* backward compatibility */
{ "CP1251", "windows-1251" },
{ "CP1252", "windows-1252" },
{ "CP1255", "windows-1255" },
{ NULL, NULL }
答案 4 :(得分:0)
为我工作:
m_func
答案 5 :(得分:0)
less -r
为我正确显示了文件。
-r or --raw-control-chars
Causes "raw" control characters to be displayed. The default is
to display control characters using the caret notation; for
example, a control-A (octal 001) is displayed as "^A". Warning:
when the -r option is used, less cannot keep track of the actual
appearance of the screen (since this depends on how the screen
responds to each type of control character). Thus, various dis-
play problems may result, such as long lines being split in the
wrong place.