查找矢量化方式以在行之间进行计算来执行for循环

时间:2013-08-18 12:24:19

标签: r

我正在尝试找到一个矢量化过程,可以替换以下代码(需要 long 时间来运行):

for (i in 2:nrow(z)) {
  if (z$customerID[i]==z$customerID[i-1]) 
     {z$timeDelta[i]<-(z$time[i]-z$time[i-1])} else {z$timeDelta[i]<- NA}
}

我尝试寻找不同的应用代码段,但没有找到任何有用的内容。

以下是一些示例数据:

customerID    time
    1         2013-04-17 15:30:00 IDT
    1         2013-05-19 11:32:00 IDT
    1         2013-05-20 10:14:00 IDT
    2         2013-03-14 18:41:00 IST
    2         2013-04-24 09:52:00 IDT
    2         2013-04-24 17:08:00 IDT

我希望获得以下输出:

customerID    time                        timeDelta*
    1         2013-04-17 15:30:00 IDT     NA
    1         2013-05-19 11:32:00 IDT     31.83 
    1         2013-05-20 10:14:00 IDT     0.94 
    2         2013-03-14 18:41:00 IST     NA
    2         2013-04-24 09:52:00 IDT     40.59
    2         2013-04-24 17:08:00 IDT     0.3 

 *I prefer the time will be in days

4 个答案:

答案 0 :(得分:10)

z$timeDelta <- NA
z$timeDelta[-1] <- ifelse(tail(z$customerID,-1) == head(z$customerID,-1), diff(z$time)/24, NA)

或更短的版本

z$timeDelta <- NA
z$timeDelta[-1] <- ifelse(!diff(z$customerID), diff(z$time)/24, NA)

答案 1 :(得分:2)

这应该适合你:

do.call(rbind,lapply(split(mydf,mydf$customerID), function(df)
    within(df,timeDelta<-c(NA,diff(time)/24))))

结果:

    customerID                time  timeDelta
1.1          1 2013-04-17 15:30:00         NA
1.2          1 2013-05-19 11:32:00 31.8347222
1.3          1 2013-05-20 10:14:00  0.9458333
2.4          2 2013-03-14 18:41:00         NA
2.5          2 2013-04-24 09:52:00 40.5909722
2.6          2 2013-04-24 17:08:00  0.3027778

答案 2 :(得分:2)

这有效:

## z <- read.table(text="customerID    time
##     1         2013-04-17.15:30:00.IDT
##     1         2013-05-19.11:32:00.IDT
##     1         2013-05-20.10:14:00.IDT
##     2         2013-03-14.18:41:00.IST
##     2         2013-04-24.09:52:00.IDT
##     2         2013-04-24.17:08:00.IDT", header=TRUE)
## 
## mydf$time <- z$time <- as.POSIXlt(gsub("\\.", " ", z$time))


do.call(rbind, lapply(split(z, z$customerID), function(x) {
    x$timeDelta <- c(NA, round(as.numeric(diff(x$time), units = "days"), 2))
    x
}))

##     customerID                time timeDelta
## 1.1          1 2013-04-17 15:30:00        NA
## 1.2          1 2013-05-19 11:32:00     31.83
## 1.3          1 2013-05-20 10:14:00      0.95
## 2.4          2 2013-03-14 18:41:00        NA
## 2.5          2 2013-04-24 09:52:00     40.63
## 2.6          2 2013-04-24 17:08:00      0.30

答案 3 :(得分:1)

在包装doBy的第一个firstobs的帮助下:

z$timeDelta <- c(NA, diff(z$time))
z$timeDelta[firstobs(z$customerID)] <- NA