我需要扩展这个问题:convert data frame of counts to proportions in R
我需要按一个条件计算比例并保留数据集的信息。
可重复的例子:
ID <- rep(c(1,2,3), each=3)
trial <- rep("a", 9)
variable1 <- sample(1:10, 9)
variable2 <- sample(1:10, 9)
variable3 <- sample(1:10, 9)
condition <- rep(c("i","j","k"), 3)
dat <- data.frame(cbind(ID, trial,variable1,variable2,variable3,condition))
对于每个变量,我希望按ID分配比例(即3次)
理想情况下,新变量将存储在与dat$variable1_p
我知道如何通过一系列for循环来完成这个技巧,但我想学习如何使用apply函数。也可以在必要时将其扩展到更多条件。
答案 0 :(得分:1)
我们可以使用adply
包中的plyr
:
library(plyr)
adply(dat, 1, function(x)
c('variable1_p' = x$variable1 / sum(dat[x$ID == dat$ID,]$variable1)))
# ID trial variable1 variable2 variable3 condition variable1_p
# 1 1 a 3 5 4 i 0.20000000
# 2 1 a 8 9 9 j 0.53333333
# 3 1 a 4 4 8 k 0.26666667
# 4 2 a 7 10 5 i 0.50000000
# 5 2 a 6 8 10 j 0.42857143
# 6 2 a 1 1 7 k 0.07142857
# 7 3 a 10 6 3 i 0.47619048
# 8 3 a 9 7 6 j 0.42857143
# 9 3 a 2 3 2 k 0.09523810
另一种选择是使用dplyr
,它可以处理每个ID每个条件有多个行的情况:
library(dplyr)
dat %>%
group_by(ID, condition) %>%
mutate(sum_v1_cond = sum(variable1)) %>%
ungroup() %>%
group_by(ID) %>%
mutate(variable1_p = sum_v1_cond / sum(variable1)) %>%
select(-sum_v1_cond)
variable1
,variable2
和variable3
的完整解决方案:adply(dat, 1, function(x)
c('variable1_p' = x$variable1 / sum(dat[x$ID == dat$ID,]$variable1),
'variable2_p' = x$variable2 / sum(dat[x$ID == dat$ID,]$variable2),
'variable3_p' = x$variable3 / sum(dat[x$ID == dat$ID,]$variable3)))
set.seed(123)
ID <- rep(c(1,2,3), each=3)
trial <- rep("a", 9)
variable1 <- sample(1:10, 9)
variable2 <- sample(1:10, 9)
variable3 <- sample(1:10, 9)
condition <- rep(c("i","j","k"), 3)
dat <- data.frame(ID, trial,variable1,variable2,variable3,condition,
stringsAsFactors = FALSE)