我有一个数据集,其中包含列datetime(开始)和datetime_end。数据处理后,我想按每分钟分钟数细分此间隔-假设我有此间隔
datetime datetime_end id disc
2019-03-19 12:47:28 2019-03-19 12:50:37 5-3 start
我想把它分解成几分钟:
datetime id disc
2019-03-19 12:48:00 5-3 start
2019-03-19 12:49:00 5-3 start
2019-03-19 12:50:00 5-3 start
2019-03-19 12:51:00 5-3 start
这是虚拟数据帧
df1 <- data.frame(stringsAsFactors=FALSE,
datetime = c("2019-03-19T13:26:52Z", "2019-03-19T13:26:19Z",
"2019-03-19T13:23:46Z", "2019-03-19T13:22:20Z",
"2019-03-19T13:09:56Z", "2019-03-19T13:06:04Z", "2019-03-19T13:05:21Z",
"2019-03-19T13:04:37Z", "2019-03-19T12:47:28Z",
"2019-03-19T12:46:42Z"),
id = c("5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3",
"5-3"),
disc = c("car", "stop", "start", "stop", "start", "stop", "start",
"stop", "start", "stop")
)
我试图使用lubridate :: interval函数来创建一个间隔对象(旅行间隔),但是我正努力将其按行每分钟分解(如上所示)。因此,如果有人知道解决方案,我将非常感激。
这是我的脚本
library(tidyverse)
library(lubridate)
df <- df1 %>%
mutate(datetime = lubridate::as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime),
# Create an interval object.
Travel_Interval =
lubridate::interval(start = datetime, end = datetime_end)) %>%
filter(!is.na(Travel_Interval)) %>%
# select(-Travel_Interval)
select(datetime,datetime_end , id , disc,Travel_Interval) %>%
filter(disc == "start")
答案 0 :(得分:2)
为此,我将使用purrr::map2()
# take df1 %>% mutate datetime column to datetime format %>% sort by datetime
# %>% add datetime_end as lead of datetime %>% filter out records with no
# recorded datetime_end %>% mutate to create column 'minute' by using
# purrr::map2 to iterate over each datetime and datetime_end pair and apply the
# following function {create an sequence of datestamps starting at the "minute
# ceiling" of 'start'datetime' and ending at the "minute ceiling" of
# 'datetime_end in one minute intervals} %>% since the resultant column is a
# list, we have to unnest the data
df <- df1 %>%
mutate(datetime = as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime, n = 1L)) %>%
filter(!is.na(datetime_end)) %>%
mutate(minute = purrr::map2(datetime, datetime_end, function(start, stop) {
seq.POSIXt(from = ceiling_date(start, 'minute'), to = ceiling_date(stop, 'minute'), by = 'min')
})) %>%
unnest()
但是请注意,由于您实际上是使用某种形式的舍入方法将时间戳有效地削减为分钟间隔(在这种情况下为上限),因此您将必须决定如何处理与边界情况。例如:disc
==“ stop”的第一轮将以minute
== 2019-03-19 12:48:00结束最后一行,但随后的{{ 1}} ==“ start” _run“的第一行也将以disc
== 2019-03-19 12:48:00开始:
minute
答案 1 :(得分:1)
df1 %>%
mutate(datetime = lubridate::as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime)) %>%
filter(!is.na(datetime_end)) %>%
mutate_at(vars(contains("datetime")), ~ round_date(.x + seconds(30), unit = "minute")) %>%
mutate(diff = time_length(interval(datetime, datetime_end), unit = "minutes")) %>%
mutate(time = map2(datetime, diff, ~ .x + minutes(seq(0, .y)))) %>%
unnest(time)
因为我已经在做它,所以只是想发布它-尽管那里已经有了很好的答案。这使用lubridate
函数time_length
和interval
获得序列。