我有一个df:
id sample1_1 sample1_2 sample2_1 sample2_2 sample2_3 sample3_1 sample3_2
honda 4.464274 7.087345 2.659297 83.513596 49.299961 22.991566 19.679316
audi 1.454645 2.784645 2.692656 14.010951 7.674361 3.84253 3.795233
我想做的是计算
ratio =4.464274/(4.464274+1.454645)*100 for each sample between honda and audi.
每行并将其绑定到新的df
id sample1_1 sample1_2 sample2_1 sample2_2 sample2_3 sample3_1 sample3_2 ratio_sample1_1...sample3_1
honda 4.464274 7.087345 2.659297 83.513596 49.299961 22.991566 19.679316
audi 1.454645 2.784645 2.692656 14.010951 7.674361 3.84253 3.795233
有没有简单的方法可以做到这一点?
样本的标准偏差重复这样的事情,但对于每个样本组
sample1_1_ratio sample1_2_ratio STD
75 71 sd(sample1_1_ratio,sample1_2_ratio)
24 28 sd(sample1_1_ratio,sample1_2_ratio)
答案 0 :(得分:3)
您可以mutate_if
与is.numeric
一起使用,为所有现有的数字创建新列:
df %>% mutate_if(is.numeric, funs(ratio = 100 * ./sum(.)))
# id sample1_1 sample1_2 sample2_1 sample2_2 sample2_3 sample3_1 sample3_2 sample1_1_ratio sample1_2_ratio sample2_1_ratio sample2_2_ratio sample2_3_ratio sample3_1_ratio sample3_2_ratio
#1 honda 4.464274 7.087345 2.659297 83.51360 49.299961 22.99157 19.679316 75.42381 71.79247 49.68835 85.63341 86.53014 85.68042 83.83256
#2 audi 1.454645 2.784645 2.692656 14.01095 7.674361 3.84253 3.795233 24.57619 28.20753 50.31165 14.36659 13.46986 14.31958 16.16744
或者,如果列名称具有实例sample
的起始模式,您也可以使用mutate_at
:
df %>% mutate_at(vars(starts_with('sample')), funs(ratio = 100 * ./sum(.)))
答案 1 :(得分:1)
这是一个稍微不同的解决方案,可以获得相同的结果,但是以更易于管理的长格式组织数据框:
library(dplyr)
library(tidyr)
df %>%
gather(sample, value, -id) %>%
group_by(sample) %>%
mutate(ratio = value / sum(value) * 100)
# A tibble: 14 x 4
# Groups: sample [7]
id sample value ratio
<fctr> <chr> <dbl> <dbl>
1 honda sample1_1 4.464274 75.42381
2 audi sample1_1 1.454645 24.57619
3 honda sample1_2 7.087345 71.79247
4 audi sample1_2 2.784645 28.20753
5 honda sample2_1 2.659297 49.68835
6 audi sample2_1 2.692656 50.31165
7 honda sample2_2 83.513596 85.63341
8 audi sample2_2 14.010951 14.36659
9 honda sample2_3 49.299961 86.53014
10 audi sample2_3 7.674361 13.46986
11 honda sample3_1 22.991566 85.68042
12 audi sample3_1 3.842530 14.31958
13 honda sample3_2 19.679316 83.83256
14 audi sample3_2 3.795233 16.16744
如果您想要比率的标准差,您可以在同一个管道中按如下方式计算它(改变每行的值):
df %>% gather(sample, value, -id) %>% group_by(sample) %>% mutate(ratio = value / sum(value) * 100, sd_sample = sd(ratio))
如果您不希望组中每行重复值,则可以在单独的管道中运行summarise(sdev = sd(ratio))
。