R中的WHILE和FOR循环优化

时间:2018-06-14 21:56:47

标签: r performance for-loop while-loop

所以我有一个销售数据库,每个销售都有一个客户ID号,如果客户有多个交易,可能会出现几次。 我希望能够看到每个客户参与交易的次数,直到交易时刻,例如,

costumer X has 3 transactions
TRANSACTION 1 ON 04/17
TRANSACTION 2 ON 02/17
TRANSACTION 1 ON 11/16

我希望在此前4个月内看到此客户之前的特定交易交易,所以我想要这样的事情

costumer X has 3 transactions
TRANSACTION 1 ON 04/17 - 1 previous transaction (transaction 1 doesnt count bc it is older than 4 monhts)
TRANSACTION 2 ON 02/17 - 1 previous transaction
TRANSACTION 1 ON 11/16 - 0 previous transactions

我的代码是

for(i in 1:length(db$COSTUMERID)){
   j<-i+1
   while(db$COSTUMERID[i]==db$COSTUMERID[j]){
    if((db$date[i]-db$date[j])>0 & (db$date[i]-db$date[j])<120){
      db$PREVIOUSTRANSACTIONS[i] <- 1+ 
    db$PREVIOUSTRANSACTIONS[i]}else{db$PREVIOUSTRANSACTIONS[i]
        j<-j+1
     }
}

1 个答案:

答案 0 :(得分:0)

我会推荐foverlaps包中的data.table函数。

它允许在没有循环的情况下进行日期范围合并:

dt <- data.table(
  customer.id = c("X", "X", "X"),
  transaction.id = c(3L, 2L, 1L), 
  date = as.Date(c("2017-04-01", "2017-02-01", "2016-11-01"))
)

dt[, previos.range.start := date - 30.5 * 4]
dt[, previos.range.end := date]

dt.previous <- copy(dt)
dt[, date.copy := date]
setkeyv(dt.previous, c("previos.range.start", "previos.range.end"))

# join previous transactions to the dataset based using date ranges
res <- foverlaps(dt, dt.previous, by.x = c("date", "date.copy"), 
                 by.y = c("previos.range.start", "previos.range.end"))  
# aggregate back to transaction granularity level and calculate previous transactions metric
res <- res[order(transaction.id), .(
                previous.transactions = sum(ifelse(transaction.id == i.transaction.id, 0L, 1L)),
                date = min(date)
              ),  
              by = transaction.id]