使用dplyr进行多级排序

时间:2015-04-28 09:02:57

标签: r dplyr

我有以下数据框:

tdf <- structure(list(GO = c("Cytokine-cytokine receptor interaction", 
"Cytokine-cytokine receptor interaction|Endocytosis", "I-kappaB kinase/NF-kappaB signaling", 
"NF-kappa B signaling pathway", "NF-kappaB import into nucleus", 
"T cell chemotaxis"), PosCount = c(17, 18, 4, 5, 1, 2), shortgo = structure(c(1L, 
1L, 2L, 2L, 2L, 3L), .Label = c("z", "X", "y"), class = "factor")), .Names = c("GO", 
"PosCount", "shortgo"), row.names = c(NA, 6L), class = "data.frame")

看起来像这样:

                                                  GO PosCount shortgo
1             Cytokine-cytokine receptor interaction       17       z
2 Cytokine-cytokine receptor interaction|Endocytosis       18       z
3                I-kappaB kinase/NF-kappaB signaling        4       X
4                       NF-kappa B signaling pathway        5       X
5                      NF-kappaB import into nucleus        1       X
6                                  T cell chemotaxis        2       y

我想要做的是首先按字母顺序排序shortgo - 不区分大小写 - 然后按每个shortgo组排序PosCount。产生这个:

                                                  GO PosCount shortgo
                       NF-kappa B signaling pathway        5       X
                I-kappaB kinase/NF-kappaB signaling        4       X
                      NF-kappaB import into nucleus        1       X
                                  T cell chemotaxis        2       y
 Cytokine-cytokine receptor interaction|Endocytosis       18       z
             Cytokine-cytokine receptor interaction       17       z

但为什么这不起作用:

library(dplyr)
tdf[order(tdf$shortgo),]
tdf <- tdf %>% group_by(shortgo) %>% arrange(desc(PosCount))

做正确的方法是什么?

2 个答案:

答案 0 :(得分:7)

您只需将它们组合成一个电话即可。虽然您需要先将shortgo转换为character课程(请参阅下面的说明)

tdf %>% 
    arrange(as.character(shortgo), desc(PosCount))
#                                                   GO PosCount shortgo
# 1                       NF-kappa B signaling pathway        5       x
# 2                I-kappaB kinase/NF-kappaB signaling        4       x
# 3                      NF-kappaB import into nucleus        1       x
# 4                                  T cell chemotaxis        2       y
# 5 Cytokine-cytokine receptor interaction|Endocytosis       18       z
# 6             Cytokine-cytokine receptor interaction       17       z

因此,您需要转换为字符的原因是因为shortgo是一个基本上是具有integer属性的levels向量的因素。所以order使用这些整数来排序你的向量。在您的情况下,整数不对应于级别的正确顺序

tdf$shortgo
## [1] z z x x x y
## Levels: z x y
as.numeric(tdf$shortgo)
## [1] 1 1 2 2 2 3

因此,您可以看到z被编码为1,x被编码为2而y被编码为3,而它应该是3,2,1。因此{{1} }返回&#34;错误&#34;结果

sort

比较
sort(tdf$shortgo)
# 1] z z x x x y
# Levels: z x y

答案 1 :(得分:3)

您可以使用order base R

with(tdf, tdf[order(tolower(shortgo), -PosCount),])

#                                                  GO PosCount shortgo
#4                       NF-kappa B signaling pathway        5       X
#3                I-kappaB kinase/NF-kappaB signaling        4       X
#5                      NF-kappaB import into nucleus        1       X
#6                                  T cell chemotaxis        2       y
#2 Cytokine-cytokine receptor interaction|Endocytosis       18       z
#1             Cytokine-cytokine receptor interaction       17       z