所以我有一个销售数据库,每个销售都有一个客户ID号,如果客户有多个交易,可能会出现几次。 我希望能够看到每个客户参与交易的次数,直到交易时刻,例如,
costumer X has 3 transactions
TRANSACTION 1 ON 04/17
TRANSACTION 2 ON 02/17
TRANSACTION 1 ON 11/16
我希望在此前4个月内看到此客户之前的特定交易交易,所以我想要这样的事情
costumer X has 3 transactions
TRANSACTION 1 ON 04/17 - 1 previous transaction (transaction 1 doesnt count bc it is older than 4 monhts)
TRANSACTION 2 ON 02/17 - 1 previous transaction
TRANSACTION 1 ON 11/16 - 0 previous transactions
我的代码是
for(i in 1:length(db$COSTUMERID)){
j<-i+1
while(db$COSTUMERID[i]==db$COSTUMERID[j]){
if((db$date[i]-db$date[j])>0 & (db$date[i]-db$date[j])<120){
db$PREVIOUSTRANSACTIONS[i] <- 1+
db$PREVIOUSTRANSACTIONS[i]}else{db$PREVIOUSTRANSACTIONS[i]
j<-j+1
}
}
答案 0 :(得分:0)
我会推荐foverlaps
包中的data.table
函数。
它允许在没有循环的情况下进行日期范围合并:
dt <- data.table(
customer.id = c("X", "X", "X"),
transaction.id = c(3L, 2L, 1L),
date = as.Date(c("2017-04-01", "2017-02-01", "2016-11-01"))
)
dt[, previos.range.start := date - 30.5 * 4]
dt[, previos.range.end := date]
dt.previous <- copy(dt)
dt[, date.copy := date]
setkeyv(dt.previous, c("previos.range.start", "previos.range.end"))
# join previous transactions to the dataset based using date ranges
res <- foverlaps(dt, dt.previous, by.x = c("date", "date.copy"),
by.y = c("previos.range.start", "previos.range.end"))
# aggregate back to transaction granularity level and calculate previous transactions metric
res <- res[order(transaction.id), .(
previous.transactions = sum(ifelse(transaction.id == i.transaction.id, 0L, 1L)),
date = min(date)
),
by = transaction.id]