对于某项活动,我有一整年的开始时间和结束时间的数据集。我想把这一天分成24个间隔,每个间隔1小时,然后计算并绘制每个人每小时花费的时间。我已经使用lubridate的mdy_hm()转换了时间。
假设数据框中的这些样本数据为df:
start_time end_time duration
8/14/15 23:36 8/15/15 5:38 359
8/15/15 14:50 8/15/15 15:25 35
8/15/15 22:43 8/16/15 2:41 236
8/16/15 3:12 8/16/15 6:16 181
8/16/15 16:52 8/16/15 17:58 66
8/16/15 23:21 8/16/15 23:47 26
8/17/15 0:04 8/17/15 2:02 118
8/17/15 8:31 8/17/15 9:45 74
8/17/15 11:06 8/17/15 13:46 159
如何查找全年每小时活动的比例?然后我将绘制结果。我尝试用hour()提取小时,在时间变量上使用group_by(),并使用summarize()中的mean函数持续时间,但我不确定逻辑。
感谢您的帮助。
答案 0 :(得分:1)
当您的数据采用“整洁”格式时,group_by(...) %>% summarise(...)
效果最佳,其中每行是您想要聚合的数据的1次观察。在您的情况下,观察是在某个给定的小时和日期内进行的一分钟。我们可以将这些逐分钟的观察作为列表列生成,使用tidyr::unnest()
将生成的数据扩展为长数据帧,然后对该数据帧进行计数:
library(dplyr)
library(lubridate)
library(tidyr)
library(ggplot2)
df <-
tibble(
start_time = c("8/14/15 23:36","8/15/15 14:50","8/15/15 22:43",
"8/16/15 3:12","8/16/15 16:52","8/16/15 23:21",
"8/17/15 0:04","8/17/15 8:31","8/17/15 11:06"),
end_time = c("8/15/15 5:38","8/15/15 15:25","8/16/15 2:41",
"8/16/15 6:16","8/16/15 17:58","8/16/15 23:47",
"8/17/15 2:02","8/17/15 9:45","8/17/15 13:46")
) %>%
mutate_at(vars(start_time, end_time), funs(mdy_hm))
worked_hours <- df %>%
# First, make a long df with a minute per row
group_by(start_time, end_time) %>%
mutate(mins = list(tibble(
min = seq(from = start_time, to = end_time - minutes(1), by = as.difftime(minutes(1)))
))) %>%
unnest() %>%
ungroup() %>%
# Aggregate over the long df (count number of rows, i.e. minutes per date, hour)
select(min) %>%
mutate(date = date(min), hour = factor(hour(min), levels = 0:23)) %>%
group_by(date, hour) %>%
tally() %>%
# Calculate proportion of hour
mutate(prop = n / 60 * 100)
worked_hours %>%
# Use tidyr::complete to fill in unobserved values
complete(date, hour, fill = list(n = 0, prop = 0)) %>%
ggplot(aes(x = hour, y = prop)) +
geom_bar(stat = "identity") +
facet_wrap(~ date, ncol = 1)