以下是我的数据框:
df <- read.table(text='
Name ActivityType ActivityDate LastSaleDate NextSaleDate
John Email 1/1/2014 NA 2/1/2014
John Sale 2/1/2014 NA 3/1/2014
John Sale 3/1/2014 2/1/2014 NA
John Seminar 4/1/2014 3/1/2014 NA
John Webinar 5/1/2014 3/1/2014 NA
Tom Email 1/1/2014 NA 2/1/2015
Tom Sale 2/1/2015 NA 3/1/2015
Tom Sale 3/1/2015 2/1/2015 NA
Tom Seminar 4/1/2015 3/1/2015 NA
Tom Webinar 5/1/2015 3/1/2015 NA
', header=T)
我正在尝试通过data.table推导出最右边的两列。我正在查看ActivityType = Sale的位置,并找到 sale 活动类型的上一个和下一个相应的活动日期。相关的dplyr解决方案将是
library(dplyr)
require(zoo)
df %>%
group_by(Name) %>%
mutate(LastSaleDate=na.locf(lag(ifelse(ActivityType=="Sale",ActivityDate,NA)),na.rm=FALSE))
非常感谢您的帮助。
答案 0 :(得分:3)
我将使用library(data.table)
。首先,让我们删除最右边的两列,然后将ActivityDate
转换为Date
类。
dt <- as.data.table(read.table(text='
Name ActivityType ActivityDate LastSaleDate NextSaleDate
John Email 1/1/2014 NA 2/1/2014
John Sale 2/1/2014 NA 3/1/2014
John Sale 3/1/2014 2/1/2014 NA
John Seminar 4/1/2014 3/1/2014 NA
John Webinar 5/1/2014 3/1/2014 NA
Tom Email 1/1/2014 NA 2/1/2015
Tom Sale 2/1/2015 NA 3/1/2015
Tom Sale 3/1/2015 2/1/2015 NA
Tom Seminar 4/1/2015 3/1/2015 NA
Tom Webinar 5/1/2015 3/1/2015 NA
', header=T))
dt[, c('ActivityDate', 'LastSaleDate', 'NextSaleDate') := list(as.Date(ActivityDate, format = '%d/%m/%Y'), NULL, NULL)]
接下来合并销售数据数据,以获得所有可能的组合,并计算任何活动和销售活动之间的天数差异:
setkeyv(dt, 'Name')
dt2 <- dt[dt[ActivityType == 'Sale'], allow.cartesian = TRUE]
dt2[, DateDiff := as.numeric(ActivityDate - i.ActivityDate)]
获得:
Name ActivityType ActivityDate i.ActivityType i.ActivityDate DateDiff
1: John Email 2014-01-01 Sale 2014-01-02 -1
2: John Sale 2014-01-02 Sale 2014-01-02 0
3: John Sale 2014-01-03 Sale 2014-01-02 1
4: John Seminar 2014-01-04 Sale 2014-01-02 2
5: John Webinar 2014-01-05 Sale 2014-01-02 3
6: John Email 2014-01-01 Sale 2014-01-03 -2
7: John Sale 2014-01-02 Sale 2014-01-03 -1
8: John Sale 2014-01-03 Sale 2014-01-03 0
9: John Seminar 2014-01-04 Sale 2014-01-03 1
10: John Webinar 2014-01-05 Sale 2014-01-03 2
11: Tom Email 2014-01-01 Sale 2015-01-02 -366
12: Tom Sale 2015-01-02 Sale 2015-01-02 0
13: Tom Sale 2015-01-03 Sale 2015-01-02 1
14: Tom Seminar 2015-01-04 Sale 2015-01-02 2
15: Tom Webinar 2015-01-05 Sale 2015-01-02 3
16: Tom Email 2014-01-01 Sale 2015-01-03 -367
17: Tom Sale 2015-01-02 Sale 2015-01-03 -1
18: Tom Sale 2015-01-03 Sale 2015-01-03 0
19: Tom Seminar 2015-01-04 Sale 2015-01-03 1
20: Tom Webinar 2015-01-05 Sale 2015-01-03 2
现在,当您对dt2 <- dt2[order(Name, ActivityDate, DateDiff)]
进行排序时,您可以通过以下方式获取上一个和下一个销售日期:
dt2[, list(ActivityType = ActivityType[1],
LastSaleDate = head(i.ActivityDate[DateDiff > 0], 1),
NextSaleDate = tail(i.ActivityDate[DateDiff < 0], 1)),
by = list(Name, ActivityDate)]
答案 1 :(得分:2)
这似乎有效,但是相当混乱:
DT[,c("LastSaleDate", "NextSaleDate") := {
w = which(ActivityType=="Sale")
lst = rep(c(NA, w ), diff(c(0, w, .N )) )
nxt = rep(c(w , NA), diff(c(1, w, .N+1)) )
list(ActivityDate[lst], ActivityDate[nxt])
}, by=Name]
Name ActivityType ActivityDate LastSaleDate NextSaleDate
1: John Email 1/1/2014 NA 2/1/2014
2: John Sale 2/1/2014 NA 3/1/2014
3: John Sale 3/1/2014 2/1/2014 NA
4: John Seminar 4/1/2014 3/1/2014 NA
5: John Webinar 5/1/2014 3/1/2014 NA
6: Tom Email 1/1/2014 NA 2/1/2015
7: Tom Sale 2/1/2015 NA 3/1/2015
8: Tom Sale 3/1/2015 2/1/2015 NA
9: Tom Seminar 4/1/2015 3/1/2015 NA
10: Tom Webinar 5/1/2015 3/1/2015 NA