我有一个类别字符串,如下所示:
categoryVector <- c("1_100_1_2_3")
我也有与每个类别相对应的时间:
timeVector <- c("2013-03-07 05:16:50,617_2013-03-07 05:19:24,984_2013-03-07 05:21:06,002_2013-03-07 05:21:06,833_2013-03-07 05:21:10,713")
我想计算在第1类和第2类上花费的时间
Time spent in category 1: (Time in 100 - Time in 1) + (Time on 2 - Time on 1)
Time spent in category 2: Time on 3 - Time on 2
我需要为200K +记录重复这些计算。在R中有没有一种有效的方法呢?
答案 0 :(得分:0)
inp <- read.table(text=gsub("_", "\n", timeVector), sep=",")
inp$V1 <- as.POSIXct(inp$V1)
inp2 <- read.table(text=gsub("_", "\n", categoryVector))
inp$diffs <- c( difftime(inp$V1[-1], inp$V1[-nrow(inp)]), NA)
inp <- cbind(inp,inp2)
V1 V2 diffs V1
1 2013-03-07 05:16:50 617 154 1
2 2013-03-07 05:19:24 984 102 100
3 2013-03-07 05:21:06 2 0 1
4 2013-03-07 05:21:06 833 4 2
5 2013-03-07 05:21:10 713 NA 3
# should probably rename those columns
tapply(inp$diffs, inp[,4], sum, na.rm=TRUE)
# 1 2 3 100
#154 4 0 102