今天我发现了一个用linux sort命令排序文件的问题。当我设置env LANG = En_US时,结果就是我所期望的。但是当LANG = en_US时,结果很奇怪。 我运行的一些命令和输出如下:
[work@xx:/data1/muce_temp/datamarts/reduce_result_file/302/1d/201212260000]$ cat dd.dat
23 340_guard 16
23 340_guard 17
23 340_guard 18
23 360_guard... 16
23 360_guard 16
23 360_guard... 17
23 360_guard... 18
[work@xx:/data1/muce_temp/datamarts/reduce_result_file/302/1d/201212260000]$ LANG=En_US sort dd.dat
23 340_guard 16
23 340_guard 17
23 340_guard 18
23 360_guard 16
23 360_guard... 16
23 360_guard... 17
23 360_guard... 18
[work@xx:/data1/muce_temp/datamarts/reduce_result_file/302/1d/201212260000]$ LANG=en_US sort dd.dat
23 340_guard 16
23 340_guard 17
23 340_guard 18
23 360_guard... 16
23 360_guard 16 (why this line appear here ? )
23 360_guard... 17
23 360_guard... 18
此文件中行的格式详细信息如下:
2^E3^F360_guard^E...^I16^Ee^E17/18^I63776769$
2^E3^F360_guard^E^I16^Ee^E17/18^I63776769$
2^E3^F360_guard^E...^I17^Ei^E0^I63776771$
2^E3^F360_guard^E...^I18^Ei^E1^I63776773$
^ E是'\ x05',^ F是'\ x06',^我是标签,$是'\ n'。
提前致谢。
答案 0 :(得分:0)
en_US调用更智能的排序算法,忽略那些在排序时通常会被忽略的点串。它显然区分大小写,因此En_US正在回归默认语言(可能是C)。
答案 1 :(得分:0)
“en_US”是“Language = English,locale = United States”的“正确”值。其他语言包括“en_GB”(英国),“en_CA”(加拿大)和en_AU(澳大利亚):
我得到了这些结果:
echo $LANG;sort tmp.txt
en_US.UTF-8
23 340_guard 16
23 340_guard 17
23 340_guard 18
23 360_guard 16
23 360_guard... 16
23 360_guard... 17
23 360_guard... 18
export LANG=en_US;echo $LANG;sort tmp.txt
en_US
23 340_guard 16
23 340_guard 17
23 340_guard 18
23 360_guard 16
23 360_guard... 16
23 360_guard... 17
23 360_guard... 18
export LANG=En_US;echo $LANG;sort tmp.txt
En_US
23 340_guard 16
23 340_guard 17
23 340_guard 18
23 360_guard 16
23 360_guard... 16
23 360_guard... 17
23 360_guard... 18
export LANG=abc-silly;echo $LANG;sort tmp.txt
abc-silly
23 340_guard 16
23 340_guard 17
23 340_guard 18
23 360_guard 16
23 360_guard... 16
23 360_guard... 17
23 360_guard... 18