如何按包含长字符串的列对R数据进行排序?以下示例说明了我的问题:
> a = matrix(NA, nrow=4, ncol=3)
> a[,1] = c(1,2,3,4)
> a[,2] = c("gene001_10M","gene002_10M","gene001_50M","gene002_50M")
> colnames(a) = c("value","sortkey","other")
> a = as.data.frame(a)
> a
value sortkey other
1 1 gene001_10M <NA>
2 2 gene002_10M <NA>
3 3 gene001_50M <NA>
4 4 gene002_50M <NA>
当我按照&#39; a&#39;现在,sortkey似乎从右到左阅读,留下了&#39; a&#39;不变:
> b = a[sort(a$sortkey),]
> b
value sortkey other
1 1 gene001_10M <NA>
2 2 gene002_10M <NA>
3 3 gene001_50M <NA>
4 4 gene002_50M <NA>
然而,我的目标是:
> b
value sortkey other
1 1 gene001_10M <NA>
3 3 gene001_50M <NA>
2 2 gene002_10M <NA>
4 4 gene002_50M <NA>
答案 0 :(得分:1)
您也可以order
使用gsub
正则表达式预先删除字母
a[order(gsub("[a-zA-Z]+", "", a$sortkey)),]
# value sortkey other
# 1 1 gene001_10M <NA>
# 3 3 gene001_50M <NA>
# 2 2 gene002_10M <NA>
# 4 4 gene002_50M <NA>
答案 1 :(得分:0)
如果您有numbers
,alphabets
等,最好使用mixedorder
中的gtools
,但此处仅适用于order
a[order(as.character(a$sortkey)),]
# value sortkey other
#1 1 gene001_10M <NA>
#3 3 gene001_50M <NA>
#2 2 gene002_10M <NA>
#4 4 gene002_50M <NA>
此外,使用sort
会获得values
代替index
sort(as.character(a$sortkey))
#[1] "gene001_10M" "gene001_50M" "gene002_10M" "gene002_50M"
或者,您必须在index.return=TRUE
FALSE
的{{1}}
sort
然后,使用
sort(as.character(a$sortkey), index.return=TRUE)
#$x
#[1] "gene001_10M" "gene001_50M" "gene002_10M" "gene002_50M"
#$ix
#[1] 1 3 2 4
另外,
a[sort(as.character(a$sortkey), index.return=TRUE)$ix,]
# value sortkey other
#1 1 gene001_10M <NA>
#3 3 gene001_50M <NA>
#2 2 gene002_10M <NA>
#4 4 gene002_50M <NA>