How do I update a data.table
by reference using a rolling join?
Here I build a table of commercials and a table of sales.
# commercials
commercials<-data.table(commercialID=c("C1","C2","C3","C4"), commercialDate=as.Date(c("2014-1-1","2014-4-1","2014-7-1","2014-9-15")))
commercials[, rollDate:=commercialDate] #Add a column, rollDate equal to commercialDate
setkey(commercials, "rollDate")
commercials
commercialID commercialDate rollDate
1: C1 2014-01-01 2014-01-01
2: C2 2014-04-01 2014-04-01
3: C3 2014-07-01 2014-07-01
4: C4 2014-09-15 2014-09-15
# sales (A sale before all commercials, a sale after commercial1 and a sale after commercial2)
sales <- data.table(saleID=c("S0", "S1", "S2"), saleDate=as.Date(c("2010-12-31", "2014-2-1", "2014-5-1")))
sales[, rollDate:=saleDate]
setkey(sales, "rollDate")
sales
saleID saleDate rollDate
1: S0 2010-12-31 2010-12-31
2: S1 2014-02-01 2014-02-01
3: S2 2014-05-01 2014-05-01
I'd like to associate each commercial with the sale immediately before it and the sale immediately after it (inclusive of a sale on the same date as the commercial).
Attemp1
commercials[sales, roll=-Inf, `:=`(saleIDBefore=saleID, saleDateBefore=saleDate)]
commercials[sales, roll=Inf, `:=`(saleIDAfter=saleID, saleDateAfter=saleDate)]
commercials
commercialID commercialDate rollDate saleIDBefore saleDateBefore saleIDAfter saleDateAfter
1: C1 2014-01-01 2014-01-01 S0 2010-12-31 S1 2014-02-01
2: C2 2014-04-01 2014-04-01 S1 2014-02-01 S2 2014-05-01
3: C3 2014-07-01 2014-07-01 S2 2014-05-01 NA <NA>
4: C4 2014-09-15 2014-09-15 NA <NA> NA <NA>
FAIL. Notice the NA under saleIDBefore.
Correct (but suboptimal) solution
commercials <- sales[commercials, roll=Inf]
setnames(commercials, c("saleID", "saleDate"), c("saleIDBefore", "saleDateBefore"))
commercials <- sales[commercials, roll=-Inf]
setnames(commercials, c("saleID", "saleDate"), c("saleIDAfter", "saleDateAfter"))
commercials
saleIDAfter saleDateAfter rollDate saleIDBefore saleDateBefore commercialID commercialDate
1: S1 2014-02-01 2014-01-01 S0 2010-12-31 C1 2014-01-01
2: S2 2014-05-01 2014-04-01 S1 2014-02-01 C2 2014-04-01
3: NA <NA> 2014-07-01 S2 2014-05-01 C3 2014-07-01
4: NA <NA> 2014-09-15 S2 2014-05-01 C4 2014-09-15
But I'd like to do this using update by reference :=
if it's possible.
答案 0 :(得分:5)
Use non-equi joins:
commercials[sales, on = .(rollDate >= rollDate),
`:=`(saleIDBefore=saleID, saleDateBefore=saleDate)][
sales, on = .(rollDate <= rollDate), mult = 'last',
`:=`(saleIDAfter=saleID, saleDateAfter=saleDate)][]
# commercialID commercialDate rollDate saleIDBefore saleDateBefore saleIDAfter saleDateAfter
#1: C1 2014-01-01 2014-01-01 S0 2010-12-31 S1 2014-02-01
#2: C2 2014-04-01 2014-04-01 S1 2014-02-01 S2 2014-05-01
#3: C3 2014-07-01 2014-07-01 S2 2014-05-01 NA <NA>
#4: C4 2014-09-15 2014-09-15 S2 2014-05-01 NA <NA>