Question

R对我描述为字母而非ASCII的序列中的字符向量进行排序。

例如：

sort(c("dog", "Cat", "Dog", "cat"))
[1] "cat" "Cat" "dog" "Dog"

三个问题：

描述此排序顺序的技术上正确的术语是什么？
我在CRAN的手册中找不到任何参考。我在哪里可以找到R？
与C，Java，Perl或PHP等其他语言中的这种行为有什么不同？

Answer 1

Details:州的

sort()：

 The sort order for character vectors will depend on the collating
 sequence of the locale in use: see ‘Comparison’.  The sort order
 for factors is the order of their levels (which is particularly
 appropriate for ordered factors).

然后

和help(Comparison)显示：

 Comparison of strings in character vectors is lexicographicwithin
 the strings using the collating sequence of the locale in use:see
 ‘locales’.  The collating sequence of locales such as ‘en_US’ is
 normally different from ‘C’ (which should use ASCII) and can be
 surprising.  Beware of making _any_ assumptions about the 
 collation order: e.g. in Estonian ‘Z’ comes between ‘S’ and ‘T’,
 and collation is not necessarily character-by-character - in
 Danish ‘aa’ sorts as a single letter, after ‘z’.  In Welsh ‘ng’
 may or may not be a single sorting unit: if it is it follows ‘g’.
 Some platforms may not respect the locale and always sort in
 numerical order of the bytes in an 8-bit locale, or in Unicode
 point order for a UTF-8 locale (and may not sort in the same order
 for the same language in different character sets).  Collation of
 non-letters (spaces, punctuation signs, hyphens, fractions and so
 on) is even more problematic.

所以这取决于您的区域设置。

Answer 2

排序取决于语言环境。我的解决方案如下...

我创建了XSD文件

~/.Renviron

然后R排序在C语言环境中

cat ~/.Renviron 
#LC_ALL=C

字符向量的R排序规则是什么？

2 个答案: