Summarize_if并连接总和的id

时间:2017-06-26 14:21:16

标签: r dplyr

以下是代码:

d <- structure(list(Gene = structure(1:3, .Label = c("k141_20041_1", 
    "k141_27047_2", "k141_70_3"), class = "factor"), phylum = structure(c(1L, 
    1L, 1L), .Label = "Firmicutes", class = "factor"), class = structure(c(1L, 
    1L, 1L), .Label = "Bacillales", class = "factor"), order = structure(c(1L, 
    1L, 1L), .Label = "Bacilli", class = "factor"), family = structure(c(1L, 
    1L, 1L), .Label = "Bacillaceae", class = "factor"), genus = structure(c(1L, 
    1L, 1L), .Label = "Bacillus", class = "factor"), species = structure(c(1L, 
    1L, 2L), .Label = c("Bacillus subtilis", "unknown"), class = "factor"), 
        SampleA = c(0, 0, 0), SampleB = c(0, 0, 0), SampleCtrl = c(3.98888888888889, 
        11.5555555555556, 3.35978835978836)), .Names = c("Gene", 
    "phylum", "class", "order", "family", "genus", "species", "SampleA", 
    "SampleB", "SampleCtrl"), row.names = c(21918L, 40410L, 40857L
    ), class = "data.frame")

这是输出数据帧:

Gene     phylum      class   order      family    genus           species SampleA SampleB
k141_20041_1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0
k141_27047_2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0
k141_70_3 Firmicutes Bacillales Bacilli Bacillaceae Bacillus           unknown       0       0
  SampleCtrl
  3.99
 11.56
  3.36

我总结如下:

library(dplyr)
d%>%
group_by(phylum,class,order,family,genus, species)%>%
summarise_if(is.numeric, sum)    

      phylum      class   order      family    genus           species SampleA SampleB SampleCtrl
      <fctr>     <fctr>  <fctr>      <fctr>   <fctr>            <fctr>   <dbl>   <dbl>      <dbl>
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0   15.54444
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus           unknown       0       0    3.35979

我想添加一列并连接汇总的基因。例如,它看起来像这样:

    phylum      class   order      family    genus           species SampleA SampleB SampleCtrl Gene
      <fctr>     <fctr>  <fctr>      <fctr>   <fctr>            <fctr>   <dbl>   <dbl>      <dbl>
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0   15.54444  k141_20041_1,k141_27047_2
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus           unknown       0       0    3.35979 k141_70_3 

感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

基本上,您希望使用toString粘贴基因,然后分组在相同的列上,包括新的Gene列,以便summarise将其包含在最终表中。

library(dplyr)
d%>%
  group_by(phylum,class,order,family,genus, species)%>%
  mutate(Gene=toString(Gene))%>%
  group_by(phylum,class,order,family,genus, species,Gene)%>%
  summarise_if(is.numeric, sum)   
      phylum      class   order      family    genus           species                       Gene SampleA SampleB SampleCtrl
      <fctr>     <fctr>  <fctr>      <fctr>   <fctr>            <fctr>                      <chr>   <dbl>   <dbl>      <dbl>
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis k141_20041_1, k141_27047_2       0       0  15.544444
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus           unknown                  k141_70_3       0       0   3.359788