在df列的子集上应用条件,并返回其他列的总和和%

时间:2015-04-21 14:43:55

标签: r dataframe plyr

我遇到了一个令人沮丧的问题,我应该,但一直无法解决。给定如下的数据帧,我想根据一个简单的条件,每周W1到W4返回“金额”列中的金额和金额百分比。数据集还包含需要在计算中忽略的NA。

我试图编写一个带有两个参数的函数来首先获取高于我条件的ID(100),然后进行除数。这是我可笑的努力。

myfxn=function(x, y, na.rm=TRUE) {
  count=x>100
  with(count,100*(sum(y,na.rm=na.rm)/sum(!is.na(y)))) 
}

zz=as.data.frame(sapply(exampledata[3:6], myfxn, y=exampledata[2]))

structure(list(ID = 1:10, amount = c(200L, 100L, 300L, 400L, 
500L, 200L, 200L, 250L, 150L, 300L), W1 = c(150L, NA, 192L, 143L, 
158L, 187L, 173L, NA, 123L, NA), W2 = c(198L, 36L, 86L, 47L, 
38L, 109L, 196L, 17L, 188L, NA), W3 = c(50L, 36L, 70L, NA, 45L, 
164L, 82L, 169L, 113L, 89L), W4 = c(124L, 18L, 133L, NA, 162L, 
23L, 65L, 153L, 145L, 173L)), .Names = c("ID", "amount", "W1", 
"W2", "W3", "W4"), class = "data.frame", row.names = c(NA, -10L
))

理想情况下,我的返回值为4行(W1:W4)和2列(金额和金额为%)的df。谢谢您的帮助!

1 个答案:

答案 0 :(得分:1)

这是一个解决方案,有点啰嗦但是它有效,更快的解决方案将涉及更复杂的代码和/或其他包,但这里的解决方案很简单只使用dplyr / tidyr / magrittr我希望我理解正确:

library(tidyr)
library(magrittr)
library(dplyr)
gather(df, Week, Value, 3:6) %>% filter(Value > 100) %>%
    group_by(Week) %>% summarise(Sum.amounts.per.week.over100 = sum(amount)) ->
    t.week.over100

gather(df, Week, Value, 3:6) %>%
    group_by(Week) %>% filter(!is.na(Value)) %>%
    summarise(Sum.amounts.per.week.total = sum(amount)) -> t.week.total

t.week <- merge(t.week.over100, t.week.total, by = "Week")
t.week$percent <- t.week$Sum.amounts.per.week.over100/t.week$Sum.amounts.per.week.total * 100

如果你想要百分比舍入:

t.week$percent <- round(t.week$percent)

我强烈建议您查看有关tidyr / dplyr和magrittr的几个教程,尤其是前两个,例如:

intro to dplyr

intro to tidyr

intro to magrittr