字符向量的R排序规则是什么?

时间:2011-08-29 11:22:05

标签: r sorting

R对我描述为字母而非ASCII的序列中的字符向量进行排序。

例如:

sort(c("dog", "Cat", "Dog", "cat"))
[1] "cat" "Cat" "dog" "Dog"

三个问题:

  1. 描述此排序顺序的技术上正确的术语是什么?
  2. 我在CRAN的手册中找不到任何参考。我在哪里可以找到R?
  3. 中的排序规则的描述
  4. 与C,Java,Perl或PHP等其他语言中的这种行为有什么不同?

2 个答案:

答案 0 :(得分:21)

Details:州的

sort()

 The sort order for character vectors will depend on the collating
 sequence of the locale in use: see ‘Comparison’.  The sort order
 for factors is the order of their levels (which is particularly
 appropriate for ordered factors).
然后

help(Comparison)显示:

 Comparison of strings in character vectors is lexicographicwithin
 the strings using the collating sequence of the locale in use:see
 ‘locales’.  The collating sequence of locales such as ‘en_US’ is
 normally different from ‘C’ (which should use ASCII) and can be
 surprising.  Beware of making _any_ assumptions about the 
 collation order: e.g. in Estonian ‘Z’ comes between ‘S’ and ‘T’,
 and collation is not necessarily character-by-character - in
 Danish ‘aa’ sorts as a single letter, after ‘z’.  In Welsh ‘ng’
 may or may not be a single sorting unit: if it is it follows ‘g’.
 Some platforms may not respect the locale and always sort in
 numerical order of the bytes in an 8-bit locale, or in Unicode
 point order for a UTF-8 locale (and may not sort in the same order
 for the same language in different character sets).  Collation of
 non-letters (spaces, punctuation signs, hyphens, fractions and so
 on) is even more problematic.

所以这取决于您的区域设置。

答案 1 :(得分:0)

排序取决于语言环境。 我的解决方案如下...

我创建了XSD文件

~/.Renviron

然后R排序在C语言环境中

cat ~/.Renviron 
#LC_ALL=C