使用长字符串逐列对R数据进行排序

时间:2014-10-03 09:56:06

标签: r sorting

如何按包含长字符串的列对R数据进行排序?以下示例说明了我的问题:

> a = matrix(NA, nrow=4, ncol=3)
> a[,1] = c(1,2,3,4)
> a[,2] = c("gene001_10M","gene002_10M","gene001_50M","gene002_50M")
> colnames(a) = c("value","sortkey","other")
> a = as.data.frame(a)
> a
  value     sortkey other
1     1 gene001_10M  <NA>
2     2 gene002_10M  <NA>
3     3 gene001_50M  <NA>
4     4 gene002_50M  <NA>

当我按照&#39; a&#39;现在,sortkey似乎从右到左阅读,留下了&#39; a&#39;不变:

> b = a[sort(a$sortkey),]
> b
  value     sortkey other
1     1 gene001_10M  <NA>
2     2 gene002_10M  <NA>
3     3 gene001_50M  <NA>
4     4 gene002_50M  <NA>

然而,我的目标是:

> b
  value     sortkey other
1     1 gene001_10M  <NA>
3     3 gene001_50M  <NA>
2     2 gene002_10M  <NA>
4     4 gene002_50M  <NA>

2 个答案:

答案 0 :(得分:1)

您也可以order使用gsub正则表达式预先删除字母

a[order(gsub("[a-zA-Z]+", "", a$sortkey)),]
#    value     sortkey other
# 1     1 gene001_10M  <NA>
# 3     3 gene001_50M  <NA>
# 2     2 gene002_10M  <NA>
# 4     4 gene002_50M  <NA>

答案 1 :(得分:0)

如果您有numbersalphabets等,最好使用mixedorder中的gtools,但此处仅适用于order

  a[order(as.character(a$sortkey)),]
  #  value     sortkey other
  #1     1 gene001_10M  <NA>
  #3     3 gene001_50M  <NA>
  #2     2 gene002_10M  <NA>
  #4     4 gene002_50M  <NA>

此外,使用sort会获得values代替index

   sort(as.character(a$sortkey))
   #[1] "gene001_10M" "gene001_50M" "gene002_10M" "gene002_50M"

或者,您必须在index.return=TRUE

中指定默认为FALSE的{​​{1}}
sort

然后,使用

   sort(as.character(a$sortkey), index.return=TRUE)
   #$x
  #[1] "gene001_10M" "gene001_50M" "gene002_10M" "gene002_50M"

  #$ix
  #[1] 1 3 2 4

另外,

   a[sort(as.character(a$sortkey), index.return=TRUE)$ix,]
  #  value     sortkey other
  #1     1 gene001_10M  <NA>
  #3     3 gene001_50M  <NA>
  #2     2 gene002_10M  <NA>
  #4     4 gene002_50M  <NA>