上次销售日期和下一个销售日期通过按名称分组的data.table

时间:2015-11-19 22:46:48

标签: r data.table

以下是我的数据框:

df <- read.table(text='

Name      ActivityType      ActivityDate      LastSaleDate    NextSaleDate     
John       Email            1/1/2014             NA              2/1/2014
John       Sale             2/1/2014             NA              3/1/2014
John       Sale             3/1/2014             2/1/2014        NA        
John       Seminar          4/1/2014             3/1/2014        NA  
John       Webinar          5/1/2014             3/1/2014        NA  
Tom        Email            1/1/2014             NA              2/1/2015
Tom        Sale             2/1/2015             NA              3/1/2015
Tom        Sale             3/1/2015             2/1/2015        NA  
Tom        Seminar          4/1/2015             3/1/2015        NA  
Tom        Webinar          5/1/2015             3/1/2015        NA
                                                                      ', header=T)

我正在尝试通过data.table推导出最右边的两列。我正在查看ActivityType = Sale的位置,并找到 sale 活动类型的上一个和下一个相应的活动日期。相关的dplyr解决方案将是

library(dplyr)
require(zoo)

df %>% 
  group_by(Name) %>% 
  mutate(LastSaleDate=na.locf(lag(ifelse(ActivityType=="Sale",ActivityDate,NA)),na.rm=FALSE))

非常感谢您的帮助。

2 个答案:

答案 0 :(得分:3)

我将使用library(data.table)。首先,让我们删除最右边的两列,然后将ActivityDate转换为Date类。

dt <- as.data.table(read.table(text='

Name      ActivityType      ActivityDate      LastSaleDate    NextSaleDate     
John       Email            1/1/2014             NA              2/1/2014
John       Sale             2/1/2014             NA              3/1/2014
John       Sale             3/1/2014             2/1/2014        NA        
John       Seminar          4/1/2014             3/1/2014        NA  
John       Webinar          5/1/2014             3/1/2014        NA  
Tom        Email            1/1/2014             NA              2/1/2015
Tom        Sale             2/1/2015             NA              3/1/2015
Tom        Sale             3/1/2015             2/1/2015        NA  
Tom        Seminar          4/1/2015             3/1/2015        NA  
Tom        Webinar          5/1/2015             3/1/2015        NA
                                                                      ', header=T))

dt[, c('ActivityDate', 'LastSaleDate', 'NextSaleDate') := list(as.Date(ActivityDate, format = '%d/%m/%Y'), NULL, NULL)]

接下来合并销售数据数据,以获得所有可能的组合,并计算任何活动和销售活动之间的天数差异:

setkeyv(dt, 'Name')
dt2 <- dt[dt[ActivityType == 'Sale'], allow.cartesian = TRUE]
dt2[, DateDiff := as.numeric(ActivityDate - i.ActivityDate)]

获得:

    Name ActivityType ActivityDate i.ActivityType i.ActivityDate DateDiff
 1: John        Email   2014-01-01           Sale     2014-01-02       -1
 2: John         Sale   2014-01-02           Sale     2014-01-02        0
 3: John         Sale   2014-01-03           Sale     2014-01-02        1
 4: John      Seminar   2014-01-04           Sale     2014-01-02        2
 5: John      Webinar   2014-01-05           Sale     2014-01-02        3
 6: John        Email   2014-01-01           Sale     2014-01-03       -2
 7: John         Sale   2014-01-02           Sale     2014-01-03       -1
 8: John         Sale   2014-01-03           Sale     2014-01-03        0
 9: John      Seminar   2014-01-04           Sale     2014-01-03        1
10: John      Webinar   2014-01-05           Sale     2014-01-03        2
11:  Tom        Email   2014-01-01           Sale     2015-01-02     -366
12:  Tom         Sale   2015-01-02           Sale     2015-01-02        0
13:  Tom         Sale   2015-01-03           Sale     2015-01-02        1
14:  Tom      Seminar   2015-01-04           Sale     2015-01-02        2
15:  Tom      Webinar   2015-01-05           Sale     2015-01-02        3
16:  Tom        Email   2014-01-01           Sale     2015-01-03     -367
17:  Tom         Sale   2015-01-02           Sale     2015-01-03       -1
18:  Tom         Sale   2015-01-03           Sale     2015-01-03        0
19:  Tom      Seminar   2015-01-04           Sale     2015-01-03        1
20:  Tom      Webinar   2015-01-05           Sale     2015-01-03        2

现在,当您对dt2 <- dt2[order(Name, ActivityDate, DateDiff)]进行排序时,您可以通过以下方式获取上一个和下一个销售日期:

dt2[, list(ActivityType = ActivityType[1],
           LastSaleDate = head(i.ActivityDate[DateDiff > 0], 1),
           NextSaleDate = tail(i.ActivityDate[DateDiff < 0], 1)),
     by = list(Name, ActivityDate)]

答案 1 :(得分:2)

这似乎有效,但是相当混乱:

DT[,c("LastSaleDate", "NextSaleDate") := {
  w   = which(ActivityType=="Sale")
  lst = rep(c(NA, w ), diff(c(0, w, .N  )) )
  nxt = rep(c(w , NA), diff(c(1, w, .N+1)) )
  list(ActivityDate[lst], ActivityDate[nxt])
}, by=Name]


    Name ActivityType ActivityDate LastSaleDate NextSaleDate
 1: John        Email     1/1/2014           NA     2/1/2014
 2: John         Sale     2/1/2014           NA     3/1/2014
 3: John         Sale     3/1/2014     2/1/2014           NA
 4: John      Seminar     4/1/2014     3/1/2014           NA
 5: John      Webinar     5/1/2014     3/1/2014           NA
 6:  Tom        Email     1/1/2014           NA     2/1/2015
 7:  Tom         Sale     2/1/2015           NA     3/1/2015
 8:  Tom         Sale     3/1/2015     2/1/2015           NA
 9:  Tom      Seminar     4/1/2015     3/1/2015           NA
10:  Tom      Webinar     5/1/2015     3/1/2015           NA