R:滚动/滑动窗口和R中的不同计数,用于滑动天数

时间:2016-12-16 11:46:01

标签: r data.table dplyr

我跟随数据集

set.seed(1)
transaction_date <- sample(seq(as.Date('2016/01/01'), as.Date('2016/02/01'), by="day"), 24)
set.seed(1)
df <- data.frame("categ" = paste0("Categ",rep(1:2,12)), "prod" = sample(paste0("Prod",rep(seq(1:3),8))), customer_id = paste0("customer ",seq(1:24)),transaction_date=transaction_date)
df_ordered <- df[order(df$cate,df$prod,df$transaction_date,df$customer_id),]
df_ordered

categ  prod customer_id transaction_date
1  Categ1 Prod1  customer 1       2016-01-09
3  Categ1 Prod1  customer 3       2016-01-18
19 Categ1 Prod1 customer 19       2016-01-28
7  Categ1 Prod1  customer 7       2016-01-29
5  Categ1 Prod2  customer 5       2016-01-06
23 Categ1 Prod2 customer 23       2016-01-07
13 Categ1 Prod2 customer 13       2016-01-14
9  Categ1 Prod2  customer 9       2016-01-16
15 Categ1 Prod2 customer 15       2016-01-20
21 Categ1 Prod2 customer 21       2016-01-24
11 Categ1 Prod3 customer 11       2016-01-05
17 Categ1 Prod3 customer 17       2016-01-31
10 Categ2 Prod1 customer 10       2016-01-02
20 Categ2 Prod1 customer 20       2016-01-11
24 Categ2 Prod1 customer 24       2016-01-23
16 Categ2 Prod1 customer 16       2016-02-01
12 Categ2 Prod2 customer 12       2016-01-04
4  Categ2 Prod2  customer 4       2016-01-27
22 Categ2 Prod3 customer 22       2016-01-03
14 Categ2 Prod3 customer 14       2016-01-08
2  Categ2 Prod3  customer 2       2016-01-12
18 Categ2 Prod3 customer 18       2016-01-15
8  Categ2 Prod3  customer 8       2016-01-17
6  Categ2 Prod3  customer 6       2016-01-25

我可以在categprod上的一个小组的第一个(最小)观察到的transaction_date后的12天内,对唯一客户进行统计。

在当前交易日期之前的12天内滑动窗口以及该桶中所有交易的计数。以下是我试图创建的输出。我想为这个任务避免for循环。

enter image description here

1 个答案:

答案 0 :(得分:3)

使用dplyr中的rollapplyzoo可以实现这一目标。首先,我们填写所有群组的所有缺失日期,以便我们使用expand.gridmerge创建一个连续的系列。然后,我们按类别和产品进行分组,按日期排列,并将滚动窗口应用于客户ID中的值。我们定义要在每个步骤应用的函数采用唯一值向量的长度,并删除NAs。最后,我们再次过滤掉添加的日期,其中customer_id不可用。

library(dplyr)
library(zoo)

set.seed(1)
transaction_date <- sample(seq(as.Date('2016/01/01'), as.Date('2016/02/01'), by="day"), 24)
set.seed(1)
df <- data.frame("categ" = paste0("Categ",rep(1:2,12)), "prod" = sample(paste0("Prod",rep(seq(1:3),8))), customer_id = paste0("customer ",seq(1:24)),transaction_date=transaction_date)

all_combinations <- expand.grid(categ=unique(df$categ), 
        prod=unique(df$prod), 
        transaction_date=seq(min(df$transaction_date), max(df$transaction_date), by="day"))

df <- merge(df, all_combinations, by=c('categ','prod','transaction_date'), all=TRUE)

res <- df %>% 
       group_by(categ, prod) %>% 
       arrange(transaction_date) %>% 
       mutate(ucust=rollapply(customer_id, width=12, FUN=function(x) length(unique(x[!is.na(x)])), partial=TRUE, align='left')) %>%
       filter(!is.na(customer_id))