使用dplyr计算行比

时间:2017-09-12 21:52:47

标签: r dplyr

我有一个df:

id  sample1_1   sample1_2   sample2_1   sample2_2   sample2_3   sample3_1   sample3_2
honda   4.464274    7.087345    2.659297    83.513596   49.299961   22.991566   19.679316
audi    1.454645    2.784645    2.692656    14.010951   7.674361    3.84253 3.795233

我想做的是计算

ratio =4.464274/(4.464274+1.454645)*100 for each sample between honda and audi.
每行

并将其绑定到新的df

期望的输出

id  sample1_1   sample1_2   sample2_1   sample2_2   sample2_3   sample3_1   sample3_2 ratio_sample1_1...sample3_1
    honda   4.464274    7.087345    2.659297    83.513596   49.299961   22.991566   19.679316
    audi    1.454645    2.784645    2.692656    14.010951   7.674361    3.84253 3.795233 

有没有简单的方法可以做到这一点?

修改

样本的标准偏差重复这样的事情,但对于每个样本组

sample1_1_ratio     sample1_2_ratio     STD
75  71  sd(sample1_1_ratio,sample1_2_ratio) 
24  28  sd(sample1_1_ratio,sample1_2_ratio)

2 个答案:

答案 0 :(得分:3)

您可以mutate_ifis.numeric一起使用,为所有现有的数字创建新列:

df %>% mutate_if(is.numeric, funs(ratio = 100 * ./sum(.)))

#     id sample1_1 sample1_2 sample2_1 sample2_2 sample2_3 sample3_1 sample3_2 sample1_1_ratio sample1_2_ratio sample2_1_ratio sample2_2_ratio sample2_3_ratio sample3_1_ratio sample3_2_ratio
#1 honda  4.464274  7.087345  2.659297  83.51360 49.299961  22.99157 19.679316        75.42381        71.79247        49.68835        85.63341        86.53014        85.68042        83.83256
#2  audi  1.454645  2.784645  2.692656  14.01095  7.674361   3.84253  3.795233        24.57619        28.20753        50.31165        14.36659        13.46986        14.31958        16.16744

或者,如果列名称具有实例sample的起始模式,您也可以使用mutate_at

df %>% mutate_at(vars(starts_with('sample')), funs(ratio = 100 * ./sum(.)))

答案 1 :(得分:1)

这是一个稍微不同的解决方案,可以获得相同的结果,但是以更易于管理的长格式组织数据框:

library(dplyr)
library(tidyr)
df %>%
  gather(sample, value, -id) %>%
  group_by(sample) %>%
  mutate(ratio = value / sum(value) * 100)
# A tibble: 14 x 4
# Groups:   sample [7]
       id    sample     value    ratio
   <fctr>     <chr>     <dbl>    <dbl>
 1  honda sample1_1  4.464274 75.42381
 2   audi sample1_1  1.454645 24.57619
 3  honda sample1_2  7.087345 71.79247
 4   audi sample1_2  2.784645 28.20753
 5  honda sample2_1  2.659297 49.68835
 6   audi sample2_1  2.692656 50.31165
 7  honda sample2_2 83.513596 85.63341
 8   audi sample2_2 14.010951 14.36659
 9  honda sample2_3 49.299961 86.53014
10   audi sample2_3  7.674361 13.46986
11  honda sample3_1 22.991566 85.68042
12   audi sample3_1  3.842530 14.31958
13  honda sample3_2 19.679316 83.83256
14   audi sample3_2  3.795233 16.16744

如果您想要比率的标准差,您可以在同一个管道中按如下方式计算它(改变每行的值):

df %>% gather(sample, value, -id) %>% group_by(sample) %>% mutate(ratio = value / sum(value) * 100, sd_sample = sd(ratio))

如果您不希望组中每行重复值,则可以在单独的管道中运行summarise(sdev = sd(ratio))