数据

Question

我有一个很简单的问题，我找不到自己的简单解决方案。我有一个表达式数据的data.frame。每行对应一个测得的基因。列是在不同时间点的度量表达式，其中每个时间点有4个重复项。看起来像这样：

         0h_1    0h_2    0h_3    0h_4    1h_1   1h_2    1h_3   1h_4    2h_1    2h_2    2h_3     2h_4    3h_1     3h_2     3h_3    3h_4 
gene1    434     123     42      94      9811   262     117    42      327     367     276      224
gene2    47      103     30      847     13     291     167    358     303     293     2263     741
gene3    322     27      97      217     223    243     328    308     328     299     518      434

我想总结每一行的所有重复项，以便结果将为每个基因有一行，并且每个时间点只有一个列，而不是四个。有什么功能可以让我高效地做到这一点？

为了澄清：我正在寻找的是一个像这样的data.frame：

         0h     1h     2h     3h     ...
gene1   693     9811  
gene2   1027    13
gene3

先谢谢了。最好，乔纳斯

Answer 1

如@AntoniosK所建议，我们可以使用summarise代替distinct和select(-iter,-value)

library(dplyr)

df %>% gather(key, value,-name) %>% 
       separate(key,into = c('timepoint','iter'),sep = '_') %>% 
       group_by(name,timepoint) %>% summarise(sum=sum(value, na.rm = TRUE)) %>% 
       spread(timepoint,sum) 

# A tibble: 3 x 4
# Groups:   name [3]
   name    X0h   X1h   X2h
  <fct> <int> <int> <int>
1 gene1   693 10232  1194
2 gene2  1027   829  3600
3 gene3   663  1102  1579

数据

df<-read.table(text="
    name      0h_1    0h_2    0h_3    0h_4    1h_1   1h_2    1h_3   1h_4    2h_1    2h_2    2h_3     2h_4    
    gene1    434     123     42      94      9811   262     117    42      327     367     276      224
    gene2    47      103     30      847     13     291     167    358     303     293     2263     741
    gene3    322     27      97      217     223    243     328    308     328     299     518      434
           ",header=TRUE)

Answer 2

在R中有一个选项：

res <- as.data.frame(lapply(split.default(df1, sub("_.*$","",names(df1))), rowSums))
names(res) <- gsub("^X","",names(res))
res
#         0h    1h   2h
# gene1  693 10232 1194
# gene2 1027   829 3600
# gene3  663  1102 1579

数据

df1 <- read.table(text="
0h_1    0h_2    0h_3    0h_4    1h_1   1h_2    1h_3   1h_4    2h_1    2h_2    2h_3     2h_4 
gene1    434     123     42      94      9811   262     117    42      327     367     276      224
gene2    47      103     30      847     13     291     167    358     303     293     2263     741
gene3    322     27      97      217     223    243     328    308     328     299     518      434
",header=T)

names(df1) <- gsub("^X","",names(df1))
df1
#       0h_1 0h_2 0h_3 0h_4 1h_1 1h_2 1h_3 1h_4 2h_1 2h_2 2h_3 2h_4
# gene1  434  123   42   94 9811  262  117   42  327  367  276  224
# gene2   47  103   30  847   13  291  167  358  303  293 2263  741
# gene3  322   27   97  217  223  243  328  308  328  299  518  434

对data.frame中的后续条目（重复项）求和

2 个答案:

数据