使用lubridate查找每小时进行一次活动的时间部分

时间:2018-03-11 00:30:11

标签: r dplyr time-series lubridate

对于某项活动,我有一整年的开始时间和结束时间的数据集。我想把这一天分成24个间隔,每个间隔1小时,然后计算并绘制每个人每小时花费的时间。我已经使用lubridate的mdy_hm()转换了时间。

假设数据框中的这些样本数据为df:

start_time      end_time        duration
8/14/15 23:36   8/15/15 5:38    359
8/15/15 14:50   8/15/15 15:25    35
8/15/15 22:43   8/16/15 2:41    236
8/16/15 3:12    8/16/15 6:16    181
8/16/15 16:52   8/16/15 17:58    66
8/16/15 23:21   8/16/15 23:47    26
8/17/15 0:04    8/17/15 2:02    118
8/17/15 8:31    8/17/15 9:45     74
8/17/15 11:06   8/17/15 13:46   159

如何查找全年每小时活动的比例?然后我将绘制结果。我尝试用hour()提取小时,在时间变量上使用group_by(),并使用summarize()中的mean函数持续时间,但我不确定逻辑。

感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

当您的数据采用“整洁”格式时,group_by(...) %>% summarise(...)效果最佳,其中每行是您想要聚合的数据的1次观察。在您的情况下,观察是在某个给定的小时和日期内进行的一分钟。我们可以将这些逐分钟的观察作为列表列生成,使用tidyr::unnest()将生成的数据扩展为长数据帧,然后对该数据帧进行计数:

library(dplyr)
library(lubridate)
library(tidyr)
library(ggplot2)

df <-
    tibble(
        start_time = c("8/14/15 23:36","8/15/15 14:50","8/15/15 22:43",
                       "8/16/15 3:12","8/16/15 16:52","8/16/15 23:21",
                       "8/17/15 0:04","8/17/15 8:31","8/17/15 11:06"),
        end_time   = c("8/15/15 5:38","8/15/15 15:25","8/16/15 2:41",
                       "8/16/15 6:16","8/16/15 17:58","8/16/15 23:47",
                       "8/17/15 2:02","8/17/15 9:45","8/17/15 13:46")
    ) %>%
    mutate_at(vars(start_time, end_time), funs(mdy_hm))

worked_hours <- df %>%
    # First, make a long df with a minute per row
    group_by(start_time, end_time) %>% 
    mutate(mins = list(tibble(
        min = seq(from = start_time, to = end_time - minutes(1), by = as.difftime(minutes(1)))
    ))) %>%
    unnest() %>%
    ungroup() %>% 

    # Aggregate over the long df (count number of rows, i.e. minutes per date, hour)
    select(min) %>% 
    mutate(date = date(min), hour = factor(hour(min), levels = 0:23)) %>%
    group_by(date, hour) %>%
    tally() %>%
    # Calculate proportion of hour
    mutate(prop = n / 60 * 100)

worked_hours %>%
    # Use tidyr::complete to fill in unobserved values
    complete(date, hour, fill = list(n = 0, prop = 0)) %>%
    ggplot(aes(x = hour, y = prop)) +
    geom_bar(stat = "identity") +
    facet_wrap(~ date, ncol = 1)