我有一个包含动物ID和时间戳的数据框(这是简化的GPS数据)。 df按日期/时间排序。我想创建一个确定行程编号的列。如果一次与下一次之间的间隔大于28800秒,则跳闸被拆分。
#some sample data
timestamp <- as.POSIXct(c("18/01/2020 06:43:38", "18/01/2020 06:44:14", "18/01/2020 16:45:07" ,"18/01/2020 16:46:07"), tz = "UTC", format = "%d/%m/%Y %H:%M:%S")
data <- data.frame("ID" = c("a","b","c","d"), "timestamp" = timestamp)
#ORIGINAL DATAFRAME
# ID timestamp
#1 a 2020-01-18 06:43:38
#2 b 2020-01-18 06:44:14
#3 c 2020-01-18 16:45:07
#4 d 2020-01-18 16:46:07
data$interval <- data$timestamp - lag(data$timestamp, n = 1L) #calculates time difference between points
data$trip <- c(1,1,2,2) # THIS IS THE LINE I NEED HELP WITH
#DATAFRAME I WANT IN THE END
#ID timestamp interval trip
#1 a 2020-01-18 06:43:38 NA secs 1
#2 b 2020-01-18 06:44:14 36 secs 1
#3 c 2020-01-18 16:45:07 36053 secs 2
#4 d 2020-01-18 16:46:07 60 secs 2
我也可以对数据进行子集化(请参见下面的示例)。
$`1`
ID timestamp interval
1 a 2020-01-18 06:43:38 NA secs
2 b 2020-01-18 06:44:14 36 secs
$`2`
ID timestamp interval
3 c 2020-01-18 16:45:07 36053 secs
4 d 2020-01-18 16:46:07 60 secs
我正在努力解释自己,我希望这有道理!
答案 0 :(得分:2)
在data.table
中执行此操作的另一种方法:
library(data.table)
setDT(data)[, interval := difftime(timestamp, shift(timestamp), units = "secs")][
, trip := 1 + cumsum(ifelse(is.na(interval > 28800), 0, interval > 28800))][]
#> ID timestamp interval trip
#> 1: a 2020-01-18 06:43:38 NA secs 1
#> 2: b 2020-01-18 06:44:14 36 secs 1
#> 3: c 2020-01-18 16:45:07 36053 secs 2
#> 4: d 2020-01-18 16:46:07 60 secs 2
split(data, by=c("trip"), keep.by = FALSE)
#> $`1`
#> ID timestamp interval
#> 1: a 2020-01-18 06:43:38 NA secs
#> 2: b 2020-01-18 06:44:14 36 secs
#>
#> $`2`
#> ID timestamp interval
#> 1: c 2020-01-18 16:45:07 36053 secs
#> 2: d 2020-01-18 16:46:07 60 secs
答案 1 :(得分:1)
您可以使用diff
和cumsum
data$interval <- c(NA, diff(data$timestamp))
data$trips <- cumsum(c(TRUE, data$interval[-1] >28800))
data
# ID timestamp trips interval
#1 a 2020-01-18 06:43:38 1 NA
#2 b 2020-01-18 06:44:14 1 36
#3 c 2020-01-18 16:45:07 2 36053
#4 d 2020-01-18 16:46:07 2 60
您可以使用split
根据trips
拆分数据。
split(data, data$trips)
在dplyr
中使用相同的逻辑
library(dplyr)
data %>%
mutate(interval = difftime(timestamp, lag(timestamp), "secs"),
trips = cumsum(c(TRUE, interval[-1] > 28800))) %>%
#To split the data
#%>% group_split(trips)