coreutils的意外行为'排序'

时间:2016-03-14 13:23:25

标签: sorting locale gnu-coreutils

我在运行Red Hat Enterprise 7.1的计算机上遇到了GNU coreutils sort 的奇怪行为:

$ echo -e 'c11 2\nc1 1\nc11 4\nc1 3' | /usr/bin/sort -k1,1
c1 1
c11 2
c11 4
c1 3

我虽然这是一个语言环境问题,事实上:

$ echo -e 'c11 2\nc1 1\nc11 4\nc1 3' | LC_ALL="C" /usr/bin/sort -k1,1
c1 1
c1 3
c11 2
c11 4

但是,我不明白为什么系统的语言环境(en_GB.UTF-8,见下文)以对我来说毫无意义的方式进行排序。看起来它忽略了白色空间并在整条线上排序。

然后我下载并编译了相同版本的排序(coreutils 8.22),令人惊讶的是我得到了我希望不用更改语言环境的行为:

$ echo -e 'c11 2\nc1 1\nc11 4\nc1 3' | /path/to/coreutils-8.22/src/sort -k1,1
c1 1
c1 3
c11 2
c11 4

我的问题有两个:

  1. 当语言环境为en_GB.UTF-8时,为什么sort似乎忽略了字段分隔符?
  2. 为什么编译版本的行为有所不同?
  3. 其他信息:

    $ sort --version
    sort (GNU coreutils) 8.22
    Copyright (C) 2013 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.
    
    Written by Mike Haertel and Paul Eggert.
    
    $ locale
    LANG=en_GB.UTF-8
    LC_CTYPE="en_GB.UTF-8"
    LC_NUMERIC="en_GB.UTF-8"
    LC_TIME="en_GB.UTF-8"
    LC_COLLATE="en_GB.UTF-8"
    LC_MONETARY="en_GB.UTF-8"
    LC_MESSAGES="en_GB.UTF-8"
    LC_PAPER="en_GB.UTF-8"
    LC_NAME="en_GB.UTF-8"
    LC_ADDRESS="en_GB.UTF-8"
    LC_TELEPHONE="en_GB.UTF-8"
    LC_MEASUREMENT="en_GB.UTF-8"
    LC_IDENTIFICATION="en_GB.UTF-8"
    LC_ALL=
    

0 个答案:

没有答案