我正在尝试找到一个矢量化过程,可以替换以下代码(需要 long 时间来运行):
for (i in 2:nrow(z)) {
if (z$customerID[i]==z$customerID[i-1])
{z$timeDelta[i]<-(z$time[i]-z$time[i-1])} else {z$timeDelta[i]<- NA}
}
我尝试寻找不同的应用代码段,但没有找到任何有用的内容。
以下是一些示例数据:
customerID time
1 2013-04-17 15:30:00 IDT
1 2013-05-19 11:32:00 IDT
1 2013-05-20 10:14:00 IDT
2 2013-03-14 18:41:00 IST
2 2013-04-24 09:52:00 IDT
2 2013-04-24 17:08:00 IDT
我希望获得以下输出:
customerID time timeDelta*
1 2013-04-17 15:30:00 IDT NA
1 2013-05-19 11:32:00 IDT 31.83
1 2013-05-20 10:14:00 IDT 0.94
2 2013-03-14 18:41:00 IST NA
2 2013-04-24 09:52:00 IDT 40.59
2 2013-04-24 17:08:00 IDT 0.3
*I prefer the time will be in days
答案 0 :(得分:10)
z$timeDelta <- NA
z$timeDelta[-1] <- ifelse(tail(z$customerID,-1) == head(z$customerID,-1), diff(z$time)/24, NA)
或更短的版本
z$timeDelta <- NA
z$timeDelta[-1] <- ifelse(!diff(z$customerID), diff(z$time)/24, NA)
答案 1 :(得分:2)
这应该适合你:
do.call(rbind,lapply(split(mydf,mydf$customerID), function(df)
within(df,timeDelta<-c(NA,diff(time)/24))))
结果:
customerID time timeDelta
1.1 1 2013-04-17 15:30:00 NA
1.2 1 2013-05-19 11:32:00 31.8347222
1.3 1 2013-05-20 10:14:00 0.9458333
2.4 2 2013-03-14 18:41:00 NA
2.5 2 2013-04-24 09:52:00 40.5909722
2.6 2 2013-04-24 17:08:00 0.3027778
答案 2 :(得分:2)
这有效:
## z <- read.table(text="customerID time
## 1 2013-04-17.15:30:00.IDT
## 1 2013-05-19.11:32:00.IDT
## 1 2013-05-20.10:14:00.IDT
## 2 2013-03-14.18:41:00.IST
## 2 2013-04-24.09:52:00.IDT
## 2 2013-04-24.17:08:00.IDT", header=TRUE)
##
## mydf$time <- z$time <- as.POSIXlt(gsub("\\.", " ", z$time))
do.call(rbind, lapply(split(z, z$customerID), function(x) {
x$timeDelta <- c(NA, round(as.numeric(diff(x$time), units = "days"), 2))
x
}))
## customerID time timeDelta
## 1.1 1 2013-04-17 15:30:00 NA
## 1.2 1 2013-05-19 11:32:00 31.83
## 1.3 1 2013-05-20 10:14:00 0.95
## 2.4 2 2013-03-14 18:41:00 NA
## 2.5 2 2013-04-24 09:52:00 40.63
## 2.6 2 2013-04-24 17:08:00 0.30
答案 3 :(得分:1)
在包装doBy的第一个firstobs的帮助下:
z$timeDelta <- c(NA, diff(z$time))
z$timeDelta[firstobs(z$customerID)] <- NA