从最近21天窗口中提取事件类型

时间:2015-05-25 18:09:48

标签: r dplyr zoo

我的数据框看起来像这样。最右边的两列是我想要的列。

**Name      ActivityType     ActivityDate   Email(last 21 says)  Webinar(last21)**             
John       Email            1/1/2014        NA                   NA   
John       Webinar          1/5/2014        NA                   NA
John       Sale             1/20/2014       Yes                  Yes
John       Webinar          3/25/2014       NA                   NA
John       Sale             4/1/2014        No                   Yes
John       Sale             7/1/2014        No                   No
Tom        Email            1/1/2015        NA                   NA   
Tom        Webinar          1/5/2015        NA                   NA
Tom        Sale             1/20/2015       Yes                  Yes
Tom        Webinar          3/25/2015       NA                   NA
Tom        Sale              4/1/2015        No                   Yes
Tom        Sale              7/1/2015        No                   No                

我只是想创建一个是/否变量,表示在过去21天内是否有电子邮件或网络研讨会,每个" Sale"交易。我正在考虑(模拟代码)这样​​使用dplyr:

custlife %>% 
group_by(Name) %>% 
 mutate(Email(last21days)=lag(ifelse(ActivityType = "Email" & ActivityDate of email within (activity date of sale - 21),Yes,No)).

我不确定实现这个的方法。请帮助。真诚地感谢您的帮助!

2 个答案:

答案 0 :(得分:5)

这是一个可能的$ cat testdata2.txt colA,colB,colC,colD val1,val2,val3,val4 val5,val6,val7,val8 $ ./transpose3.sh testdata2.txt colA,colB,colC,colD val1,val2,val3,val4 val5,val6,val7,val8 解决方案。在这里,我创建了2个临时数据集 - 一个用于data.table,一个用于其余活动类型,然后通过滚动窗口21在它们之间连接,同时使用Sale以检查每个中的条件加入。然后,我将结果加入原始数据集。

将日期列转换为by = .EACHI类,并按名称和日期键入数据(用于最终/滚动连接)

Date

为每个活动创建2个临时数据集

library(data.table)
setkey(setDT(df)[, ActivityDate := as.IDate(ActivityDate, "%m/%d/%Y")], Name, ActivityDate)

在检查条件时,通过滚动窗口21加入销售临时数据集

Saletemp <- df[ActivityType == "Sale", .(Name, ActivityDate)]
Elsetemp <- df[ActivityType != "Sale", .(Name, ActivityDate, ActivityType)]

加入所有内容

Saletemp[Elsetemp, `:=`(Email21 = as.logical(which(i.ActivityType == "Email")), 
                        Webinar21 = as.logical(which(i.ActivityType == "Webinar"))), 
         roll = -21, by = .EACHI]

答案 1 :(得分:2)

以下是base R的另一个选项:

df首先根据Name进行拆分,然后在每个子集中,针对每个Sale,查看是否在销售后的21天内有电子邮件(网络研讨会)。最后,根据Name,列表未拆分 您只需要FALSE替换no,然后TRUE替换yes

df_split <- split(df, df$Name)

df_split <- lapply(df_split, function(tab){
                                i_s <- which(tab[,2]=="Sale")
                                tab$Email21[i_s] <- sapply(tab[i_s, 3], function(d_s){any(tab[tab$ActivityType=="Email", 3] >= d_s-21)})
                                tab$Webinar21[i_s] <- sapply(tab[i_s, 3], function(d_s){any(tab[tab$ActivityType=="Webinar", 3] >= d_s-21)})
                                tab
                              })
df_res <- unsplit(df_split, df$Name)

df_res
#   Name ActivityType ActivityDate Email21 Webinar21
#1  John        Email   2014-01-01      NA        NA
#2  John      Webinar   2014-01-05      NA        NA
#3  John         Sale   2014-01-20    TRUE      TRUE
#4  John      Webinar   2014-03-25      NA        NA
#5  John         Sale   2014-04-01   FALSE      TRUE
#6  John         Sale   2014-07-01   FALSE     FALSE
#7   Tom        Email   2015-01-01      NA        NA
#8   Tom      Webinar   2015-01-05      NA        NA
#9   Tom         Sale   2015-01-20    TRUE      TRUE
#10  Tom      Webinar   2015-03-25      NA        NA
#11  Tom         Sale   2015-04-01   FALSE      TRUE
#12  Tom         Sale   2015-07-01   FALSE     FALSE

数据

df <- structure(list(Name = c("John", "John", "John", "John", "John", 
"John", "Tom", "Tom", "Tom", "Tom", "Tom", "Tom"), ActivityType = c("Email", 
"Webinar", "Sale", "Webinar", "Sale", "Sale", "Email", "Webinar", 
"Sale", "Webinar", "Sale", "Sale"), ActivityDate = structure(c(16071, 
16075, 16090, 16154, 16161, 16252, 16436, 16440, 16455, 16519, 
16526, 16617), class = "Date")), .Names = c("Name", "ActivityType", 
"ActivityDate"), row.names = c(NA, -12L), index = structure(integer(0), ActivityType = c(1L, 
7L, 3L, 5L, 6L, 9L, 11L, 12L, 2L, 4L, 8L, 10L)), class = "data.frame")