R dplyr总结基于条件

时间:2015-08-20 08:53:15

标签: r dplyr group-summaries

根据我们生成的报告,我有一个从网站下载的项目数据集。我们的想法是根据下载次数删除不再需要的报告。逻辑基本上计算去年下载的所有报告,检查它们是否超出当前年度中位数的两个绝对偏差,检查报告是否已在过去4周内下载,如果是,如何很多次

我有下面的代码不起作用,我想知道是否有人可以提供帮助 它给了我错误:对于n_recent_downloads部分

FUN中的错误(X [[1L]],...):仅在具有所有数字变量的数据框上定义

reports <- c("Report_A","Report_B","Report_C","Report_D","Report_A","Report_A","Report_A","Report_D","Report_D","Report_D")
Week_no <- c(36,36,33,32,20,18,36,30,29,27)

New.Downloads <- data.frame (Report1 = reports, DL.Week =  Week_no)


test <- New.Downloads %>%
  group_by(report1) %>%
  summarise(n_downloads = n(),
        n_recent_downloads = ifelse(sum((as.integer(DL.Week) >= (as.integer(max(DL.Week))) - 4),value,0)))

1 个答案:

答案 0 :(得分:1)

提供可重复的例子会让生活变得更轻松。尽管如此,我已修改您的代码以执行我认为您尝试实现的目标。

我把它分成两部分,这样你就可以看到发生了什么。我将ifelse语句移至mutate调用,该调用给出了:

library(dplyr)

New.Downloads <- data.frame(
  Report1 = c("Report_A","Report_B","Report_C","Report_D","Report_A","Report_A","Report_A","Report_D","Report_D","Report_D"), 
  DL.Week = as.numeric(c(36,36,33,32,20,18,36,30,29,27))
)

test <- New.Downloads %>%
  group_by(Report1) %>%
  mutate(
    median = median(DL.Week),
    mad = 2 * mad(DL.Week),
    check = ifelse(DL.Week > median + mad | DL.Week < median - mad, 0, DL.Week)
  ) 

test

Source: local data frame [10 x 5]
Groups: Report1

    Report1 DL.Week median     mad check
1  Report_A      36   28.0 23.7216    36
2  Report_B      36   36.0  0.0000    36
3  Report_C      33   33.0  0.0000    33
4  Report_D      32   29.5  4.4478    32
5  Report_A      20   28.0 23.7216    20
6  Report_A      18   28.0 23.7216    18
7  Report_A      36   28.0 23.7216    36
8  Report_D      30   29.5  4.4478    30
9  Report_D      29   29.5  4.4478    29
10 Report_D      27   29.5  4.4478    27

请注意,在您的示例中,没有任何值被归类为相对于median + 2 * mad条件的极值,因此check值与DL.week相同。

然后,您可以将summarise链接到此末尾以获得总和。

test %>%
  summarise(
    n_recent_downloads = sum(check)
  )

Source: local data frame [4 x 2]

   Report1 n_recent_downloads
1 Report_A                110
2 Report_B                 36
3 Report_C                 33
4 Report_D                118