无法解释排序(1)的行为

时间:2013-12-05 16:23:55

标签: linux sorting ls

当我看到ls按照奇怪的顺序列出以下文件时,我一直很困惑:

Star Wars Episode II - Attack of the Clones (2002) BDRip.mkv
Star Wars Episode III - Revenge of the Sith (2005) BDRip.mkv
Star Wars Episode I - The Phantom Menace (1999) BDRip.mkv
Star Wars Episode IV - A New Hope (1977) BDRip.mkv
Star Wars Episode VI - Return of the Jedi (1983) BDRip.mkv
Star Wars Episode V - The Empire Strikes Back (1980) BDRip.mkv

从人的角度来看,'我'应先行,然后行'行'等等。

所以我创建了包含以下内容的文件:

$ cat 1
Star Wars Episode II - Attack
Star Wars Episode III - Revenge
Star Wars Episode I - The
Star Wars Episode IV - A
Star Wars Episode VI - Return
Star Wars Episode V - The

如果我对它进行排序它会给我这个:

$ sort 1
Star Wars Episode II - Attack
Star Wars Episode III - Revenge
Star Wars Episode I - The
Star Wars Episode IV - A
Star Wars Episode VI - Return
Star Wars Episode V - The

但是,如果我删除了' - ',并且在排序正确之后的所有内容:

$ cat 1
Star Wars Episode II 
Star Wars Episode III 
Star Wars Episode I 
Star Wars Episode IV 
Star Wars Episode VI 
Star Wars Episode V 

$ sort 1
Star Wars Episode I 
Star Wars Episode II 
Star Wars Episode III 
Star Wars Episode IV 
Star Wars Episode V 
Star Wars Episode VI 

所以,只要我在空格后添加任何符号,它就会开始为我排序不可预测:

$ cat 1
Star Wars Episode II y
Star Wars Episode III x
Star Wars Episode I z
Star Wars Episode IV w
Star Wars Episode VI v
Star Wars Episode V u

$ sort 1
Star Wars Episode III x
Star Wars Episode II y
Star Wars Episode IV w
Star Wars Episode I z
Star Wars Episode VI v
Star Wars Episode V u

有关此排序行为的任何提示吗?

更新:排序:使用'en_CA.UTF-8'排序规则

根据下面的评论

更新#2 是因为语言环境。

ls | LANG=C sort
Star Wars Episode I - The Phantom Menace (1999) BDRip.mkv
Star Wars Episode II - Attack of the Clones (2002) BDRip.mkv
Star Wars Episode III - Revenge of the Sith (2005) BDRip.mkv
Star Wars Episode IV - A New Hope (1977) BDRip.mkv
Star Wars Episode V - The Empire Strikes Back (1980) BDRip.mkv
Star Wars Episode VI - Return of the Jedi (1983) BDRip.mkv

为什么UTF8语言环境会让它与众不同? 我检查了ru_RU.UTF8(错误排序)和ru_RU.KOI8-R(正确排序)

更新#3 这是关于区域设置:http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

2 个答案:

答案 0 :(得分:2)

答案 1 :(得分:1)

使用基于区域设置的排序时忽略所有非字母数字字符:

II - Attack   -> "IIA"
III - Revenge -> "III"
I - The       -> "ITh"
IV - A        -> "IVA"
VI - Return   -> "VIR"
V - The       -> "VTh"

使用LC_ALL=C,空格字符在字母数字前面排序:

I - The       -> "I -"
II - Attack   -> "II "
III - Revenge -> "III"
IV - A        -> "IV "
V - The       -> "V -"
VI - Return   -> "VI "

所以这很有道理,但这需要30多部电影才真正失败。