我有一个包含dateTime列的数据集。我需要计算每个唯一ID的4小时延伸的明显次数。这就是我到目前为止所拥有的...
library(data.table)
library(lubridate)
# Fake data
myID <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)
timeStamp1 <- c("2017-08-01 00:01:00", "2017-08-01 00:02:00", "2017-08-01 00:03:00", "2017-08-01 00:04:00",
"2017-08-01 03:00:00", "2017-08-01 05:00:00", "2017-08-01 05:01:00", "2017-08-01 05:02:00",
"2017-08-01 01:00:00", "2017-08-01 04:00:00", "2017-08-01 04:59:00", "2017-08-01 05:00:01",
"2017-08-01 08:00:00", "2017-08-01 09:01:00", "2017-08-01 13:01:00", "2017-08-01 13:02:00")
df1 <- data.frame(myID, timeStamp1)
dt1 <- setDT(df1)
# Convert to date type
dt1 <- dt1[, BTS := ymd_hms(timeStamp1)]
# Order by MMSI and then TimeStamp
dt1 <- dt1[order(myID, BTS)]
# Create lagged time
dt1 <- dt1[, l_BTS := shift(BTS), by = myID]
# Create span variable
dt1 <- dt1[, spans1 := abs(l_BTS - BTS)]
我认为这涉及difftime()
和/或as.duration()
和/或cumsum()
的某种组合,但我一直在挖掘自己更深的漏洞。所需的输出如下所示:
我认为这会产生我想要的结果,但是我肯定在这里做错了事:
# Count distinct transits by 4 hour blocks
dt1 <- dt1[, tFlag := c(FALSE, diff(as.Date(BTS))) > .1666667, by = myID]
dt1 <- dt1[, t_Count := cumsum(tFlag), by = myID]
答案 0 :(得分:2)
我不确定我是否了解您,但是如果您需要每组myID的最早时间戳和最新时间戳之间的区别,可以执行以下操作:
library(tidyverse)
dt1 %>% group_by(myID) %>%
summarise(min = min(BTS),
max = max(BTS)) %>%
mutate(delta = difftime(max, min, units = "hours")/4,
transits = as.numeric(floor(difftime(max, min, units = "hours")/4)))
# A tibble: 2 x 5
myID min max delta transits
<dbl> <dttm> <dttm> <time> <dbl>
1 2017-08-01 00:01:00 2017-08-01 05:02:00 1.25416666666667 1
2 2017-08-01 01:00:00 2017-08-01 13:02:00 3.00833333333333 3