数据如下所示:
> tmp
gene go
1 44M2.3 GO:0000166
2 44M2.3 GO:0003723
3 44M2.3 GO:0004527
4 44M2.3 GO:0005730
5 44M2.3 GO:0070062
6 44M2.3 GO:0090305
7 44M2.3 GO:0090305
8 44M2.3 GO:0090305
9 A0A087WUJ7 GO:0004553
10 A0A087WUJ7 GO:0005975
>dput(tmp)
structure(list(gene = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L), .Label = c("44M2.3", "A0A087WUJ7"), class = "factor"),
go = structure(c(1L, 2L, 3L, 5L, 7L, 8L, 8L, 8L, 4L, 6L), .Label = c("GO:0000166",
"GO:0003723", "GO:0004527", "GO:0004553", "GO:0005730", "GO:0005975",
"GO:0070062", "GO:0090305"), class = "factor")), .Names = c("gene",
"go"), row.names = c(NA, -10L), class = "data.frame")
使用plyr
包,我可以获得基因列表及其相应的go语句如下:
> dlply(tmp, .(gene),function(x) {x[["go"]]})
$`44M2.3`
[1] GO:0000166 GO:0003723 GO:0004527 GO:0005730 GO:0070062 GO:0090305 GO:0090305 GO:0090305
Levels: GO:0000166 GO:0003723 GO:0004527 GO:0004553 GO:0005730 GO:0005975 GO:0070062 GO:0090305
$A0A087WUJ7
[1] GO:0004553 GO:0005975
Levels: GO:0000166 GO:0003723 GO:0004527 GO:0004553 GO:0005730 GO:0005975 GO:0070062 GO:0090305
但是如何使用dplyr
实现类似的行为?
答案 0 :(得分:1)
如评论中所述,基本R方法是:
split(tmp$go, f = tmp$gene)
给出了:
#$`44M2.3`
#[1] GO:0000166 GO:0003723 GO:0004527 GO:0005730 GO:0070062 GO:0090305 GO:0090305 GO:0090305
#Levels: GO:0000166 GO:0003723 GO:0004527 GO:0004553 GO:0005730 GO:0005975 GO:0070062 GO:0090305
#$A0A087WUJ7
#[1] GO:0004553 GO:0005975
#Levels: GO:0000166 GO:0003723 GO:0004527 GO:0004553 GO:0005730 GO:0005975 GO:0070062 GO:0090305