你如何使用dplyr获得与dlply计算的类似结果?

时间:2015-03-12 20:18:04

标签: r dplyr

数据如下所示:

> tmp
         gene         go
1      44M2.3 GO:0000166
2      44M2.3 GO:0003723
3      44M2.3 GO:0004527
4      44M2.3 GO:0005730
5      44M2.3 GO:0070062
6      44M2.3 GO:0090305
7      44M2.3 GO:0090305
8      44M2.3 GO:0090305
9  A0A087WUJ7 GO:0004553
10 A0A087WUJ7 GO:0005975

>dput(tmp)
structure(list(gene = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L), .Label = c("44M2.3", "A0A087WUJ7"), class = "factor"), 
    go = structure(c(1L, 2L, 3L, 5L, 7L, 8L, 8L, 8L, 4L, 6L), .Label = c("GO:0000166", 
    "GO:0003723", "GO:0004527", "GO:0004553", "GO:0005730", "GO:0005975", 
    "GO:0070062", "GO:0090305"), class = "factor")), .Names = c("gene", 
"go"), row.names = c(NA, -10L), class = "data.frame")

使用plyr包,我可以获得基因列表及其相应的go语句如下:

> dlply(tmp, .(gene),function(x) {x[["go"]]})
$`44M2.3`
[1] GO:0000166 GO:0003723 GO:0004527 GO:0005730 GO:0070062 GO:0090305 GO:0090305 GO:0090305
Levels: GO:0000166 GO:0003723 GO:0004527 GO:0004553 GO:0005730 GO:0005975 GO:0070062 GO:0090305

$A0A087WUJ7
[1] GO:0004553 GO:0005975
Levels: GO:0000166 GO:0003723 GO:0004527 GO:0004553 GO:0005730 GO:0005975 GO:0070062 GO:0090305

但是如何使用dplyr实现类似的行为?

1 个答案:

答案 0 :(得分:1)

如评论中所述,基本R方法是:

split(tmp$go, f = tmp$gene)

给出了:

#$`44M2.3`
#[1] GO:0000166 GO:0003723 GO:0004527 GO:0005730 GO:0070062 GO:0090305 GO:0090305 GO:0090305
#Levels: GO:0000166 GO:0003723 GO:0004527 GO:0004553 GO:0005730 GO:0005975 GO:0070062 GO:0090305

#$A0A087WUJ7
#[1] GO:0004553 GO:0005975
#Levels: GO:0000166 GO:0003723 GO:0004527 GO:0004553 GO:0005730 GO:0005975 GO:0070062 GO:0090305