当使用函数sort(x)
时,x
是一个字符,字母“y”跳到中间,紧跟在字母“i”之后:
> letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t"
[21] "u" "v" "w" "x" "y" "z"
> sort(letters)
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[21] "t" "u" "v" "w" "x" "z"
原因可能是我位于立陶宛,这是“立陶宛式”字母排序,但我需要正常排序。如何在R代码中将排序方法更改回正常状态?
我在Win7上使用R 2.15.2。
答案 0 :(得分:39)
您需要更改运行R的区域设置。要么对整个Windows安装(这似乎不是最理想的)或在R会话中执行此操作,请执行以下操作:
Sys.setlocale("LC_COLLATE", "C")
您可以使用任何其他有效的区域设置字符串代替"C"
,但这可以让您返回到所需的letters
排序顺序。
请阅读?locales
了解更多信息。
我认为值得注意的是姐妹函数Sys.getlocale()
,它查询语言环境参数的当前设置。因此你可以做到
(locCol <- Sys.getlocale("LC_COLLATE"))
Sys.setlocale("LC_COLLATE", "lt_LT")
sort(letters)
Sys.setlocale("LC_COLLATE", locCol)
sort(letters)
Sys.getlocale("LC_COLLATE")
## giving:
> (locCol <- Sys.getlocale("LC_COLLATE"))
[1] "en_GB.UTF-8"
> Sys.setlocale("LC_COLLATE", "lt_LT")
[1] "lt_LT"
> sort(letters)
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n"
[16] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "z"
> Sys.setlocale("LC_COLLATE", locCol)
[1] "en_GB.UTF-8"
> sort(letters)
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o"
[16] "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
> Sys.getlocale("LC_COLLATE")
[1] "en_GB.UTF-8"
当你安装 devtools 时,@ Hadley的答案显示with_collate()
更简洁一些。
答案 1 :(得分:34)
如果您想暂时执行此操作,devtools
会提供with_collate
功能:
library(devtools)
with_collate("C", sort(letters))
# [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
# [20] "t" "u" "v" "w" "x" "y" "z"
with_collate("lt_LT", sort(letters))
# [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n" "o" "p" "q" "r"
# [20] "s" "t" "u" "v" "w" "x" "z"