问题
创建一个包含百分比的新行
数据
df<- data.frame(
species = c ("A","A","A","A","B","B","B","B","A","A","A","A","B","B","B","B"),
number = c(1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2),
treatment = c(0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1),
variable = c ("x","y","x","y","x","y","x","y","x","y","x","y","x","y","x","y"),
value = sample(1:16)
)
问题
我想计算给定数量和治疗种类的百分比。变量x和y(两个第一行)应该总和为100%。
我尝试使用dplyr:
result <- df%>%
group_by(variable) %>%
mutate(percent = value*100/sum(value))
test<-subset(result,variable=="x")
sum(test[,6]) # sums to 100%
“测试”是错误的,因为它是两个物种和两种处理中所有x的百分比。
所需的输出
species number treatment variable value percent
A 1 0 x 40 40
A 1 0 y 60 60
A 2 0 x 1 10
A 2 0 y 9 90
答案 0 :(得分:3)
以下是使用tidyr
的答案:
require(tidyr)
require(dplyr)
df %>% spread(variable, value) %>%
mutate(percent.x = x / (x+y),
percent.y = y / (x+y))
这里也是dplyr
唯一的解决方案:
df %>% group_by(number, treatment, species) %>%
mutate(percent = 100 * value / sum(value))
你的问题是你正在对错误的变量进行group_by()
。由于您希望在特定(number, treatment, solution)
组合中定义的百分比,但要在variable
内变化,您应该group_by()
前者,而不是后者。
答案 1 :(得分:1)
这是你正在寻找的吗?我正在使用data.table
包:
library(data.table)
DT <- as.data.table(df)
DT_output <- DT[,list(value=sum(value)),by=c('species', 'number', 'treatment', 'variable')]
DT_temp <- DT[,list(sum=sum(value)),by=c('species', 'number', 'treatment' )]
T_output <- merge(DT_output, DT_temp, by = c('species', 'number', 'treatment'))
DT_output[, percent := 100 * value / sum]
setorder(DT_output, species,treatment,number,variable)
DT_output