将summarise_each与自定义函数一起使用,该函数取决于汇总的数据月份

时间:2015-11-13 01:00:58

标签: r dplyr

我有一张表,每天有150多个变量,跨越5年。我想为每个月创建每个变量的每日平均摘要。但是,如果月份是1月,5月,7月,9月,11月或12月,我想将所有值的总和除以计数 - 1.

dplyr的summarise_each适用于我想做的事情。但是,我没有成功将自定义函数集成到funs参数中:

by.ym <- training %>% filter(Day.W!=1) %>% group_by(training, year=year(Date), month=month(Date))

testb <- summarise_each(by.ym[,-c(1:3)], 
                        funs(. / (if (month %in% c(1, 5, 7, 9, 11, 12)) {
                          sum(.)/(nrow(.)-1)
                        } else mean(.))
                        ))

错误消息是:

Error: expecting a single value
In addition: Warning messages:
1: In if (c(10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,  :
  the condition has length > 1 and only the first element will be used
2: In if (c(10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,  :
  the condition has length > 1 and only the first element will be used

1 个答案:

答案 0 :(得分:1)

将评论建议放在一起,并使用iris作为测试数据:

library(dplyr)
library(tidyr)

multipliers = data_frame(
  month = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
  bevel = c(1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1)
)

iris %>%
  select(-Species) %>%
  mutate(month = 1:12 %>% rep(length.out = n()) ) %>%
  gather(variable, value, -month) %>%
  left_join(multipliers) %>%
  group_by(month, variable) %>%
  summarize(value = sum(value) / (n() - first(bevel))) %>%
  spread(variable, value)