使用dplyr计算具有多个物种,处理和变量的数据框架的百分比

时间:2016-02-16 12:32:23

标签: r dataframe dplyr plyr

问题

创建一个包含百分比的新行

数据

 df<- data.frame(
     species   = c ("A","A","A","A","B","B","B","B","A","A","A","A","B","B","B","B"),
     number    = c(1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2),
     treatment = c(0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1),
     variable  = c ("x","y","x","y","x","y","x","y","x","y","x","y","x","y","x","y"),
     value = sample(1:16)
    )

问题

我想计算给定数量和治疗种类的百分比。变量x和y(两个第一行)应该总和为100%。

我尝试使用dplyr:

result <- df%>%
    group_by(variable) %>%
    mutate(percent = value*100/sum(value))

test<-subset(result,variable=="x")
sum(test[,6]) # sums to 100%

“测试”是错误的,因为它是两个物种和两种处理中所有x的百分比。

所需的输出

 species number treatment variable value    percent
    A      1         0        x     40         40
    A      1         0        y     60         60
    A      2         0        x      1         10
    A      2         0        y      9         90

2 个答案:

答案 0 :(得分:3)

以下是使用tidyr的答案:

require(tidyr)
require(dplyr) 

df %>% spread(variable, value) %>% 
        mutate(percent.x = x / (x+y), 
               percent.y = y / (x+y)) 

这里也是dplyr唯一的解决方案:

df %>% group_by(number, treatment, species) %>% 
        mutate(percent = 100 * value / sum(value)) 

你的问题是你正在对错误的变量进行group_by()。由于您希望在特定(number, treatment, solution)组合中定义的百分比,但要在variable内变化,您应该group_by()前者,而不是后者。

答案 1 :(得分:1)

这是你正在寻找的吗?我正在使用data.table包:

library(data.table)
DT <- as.data.table(df)

DT_output <- DT[,list(value=sum(value)),by=c('species', 'number', 'treatment', 'variable')]
DT_temp <- DT[,list(sum=sum(value)),by=c('species', 'number', 'treatment' )]

T_output <- merge(DT_output, DT_temp, by = c('species', 'number', 'treatment'))

DT_output[, percent := 100 * value / sum]

setorder(DT_output, species,treatment,number,variable)
DT_output