我遇到了一个令人沮丧的问题,我应该,但一直无法解决。给定如下的数据帧,我想根据一个简单的条件,每周W1到W4返回“金额”列中的金额和金额百分比。数据集还包含需要在计算中忽略的NA。
我试图编写一个带有两个参数的函数来首先获取高于我条件的ID(100),然后进行除数。这是我可笑的努力。
myfxn=function(x, y, na.rm=TRUE) {
count=x>100
with(count,100*(sum(y,na.rm=na.rm)/sum(!is.na(y))))
}
zz=as.data.frame(sapply(exampledata[3:6], myfxn, y=exampledata[2]))
structure(list(ID = 1:10, amount = c(200L, 100L, 300L, 400L,
500L, 200L, 200L, 250L, 150L, 300L), W1 = c(150L, NA, 192L, 143L,
158L, 187L, 173L, NA, 123L, NA), W2 = c(198L, 36L, 86L, 47L,
38L, 109L, 196L, 17L, 188L, NA), W3 = c(50L, 36L, 70L, NA, 45L,
164L, 82L, 169L, 113L, 89L), W4 = c(124L, 18L, 133L, NA, 162L,
23L, 65L, 153L, 145L, 173L)), .Names = c("ID", "amount", "W1",
"W2", "W3", "W4"), class = "data.frame", row.names = c(NA, -10L
))
理想情况下,我的返回值为4行(W1:W4)和2列(金额和金额为%)的df。谢谢您的帮助!
答案 0 :(得分:1)
这是一个解决方案,有点啰嗦但是它有效,更快的解决方案将涉及更复杂的代码和/或其他包,但这里的解决方案很简单只使用dplyr / tidyr / magrittr我希望我理解正确:
library(tidyr)
library(magrittr)
library(dplyr)
gather(df, Week, Value, 3:6) %>% filter(Value > 100) %>%
group_by(Week) %>% summarise(Sum.amounts.per.week.over100 = sum(amount)) ->
t.week.over100
gather(df, Week, Value, 3:6) %>%
group_by(Week) %>% filter(!is.na(Value)) %>%
summarise(Sum.amounts.per.week.total = sum(amount)) -> t.week.total
t.week <- merge(t.week.over100, t.week.total, by = "Week")
t.week$percent <- t.week$Sum.amounts.per.week.over100/t.week$Sum.amounts.per.week.total * 100
如果你想要百分比舍入:
t.week$percent <- round(t.week$percent)
我强烈建议您查看有关tidyr / dplyr和magrittr的几个教程,尤其是前两个,例如: