我在运行Red Hat Enterprise 7.1的计算机上遇到了GNU coreutils sort 的奇怪行为:
$ echo -e 'c11 2\nc1 1\nc11 4\nc1 3' | /usr/bin/sort -k1,1
c1 1
c11 2
c11 4
c1 3
我虽然这是一个语言环境问题,事实上:
$ echo -e 'c11 2\nc1 1\nc11 4\nc1 3' | LC_ALL="C" /usr/bin/sort -k1,1
c1 1
c1 3
c11 2
c11 4
但是,我不明白为什么系统的语言环境(en_GB.UTF-8,见下文)以对我来说毫无意义的方式进行排序。看起来它忽略了白色空间并在整条线上排序。
然后我下载并编译了相同版本的排序(coreutils 8.22),令人惊讶的是我得到了我希望不用更改语言环境的行为:
$ echo -e 'c11 2\nc1 1\nc11 4\nc1 3' | /path/to/coreutils-8.22/src/sort -k1,1
c1 1
c1 3
c11 2
c11 4
我的问题有两个:
其他信息:
$ sort --version
sort (GNU coreutils) 8.22
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and Paul Eggert.
$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=