有很多很棒的方法可以将基于时间戳的数据聚合成几周。但是我真的很艰难,总共花了不到一周的时间。我已经在Google上搜索了几天,绞尽脑汁,发现了一些非常难用和丑陋的方法来编程循环。使用tidyverse必须有一个优雅的解决方案。
让我们说我以时间戳记格式记录了鸟类的目击记录。两列:时间戳记,鸟名
像
这样按周汇总计数很容易birds_per_week<- data %>% group_by(week = cut(timestamp, "week", start.on.monday = TRUE)) %>% summarise(n())
但是我真的很艰难,我想知道不完整的一周的计数。假设今天是星期一上午10点,我想知道星期一星期一至星期三中午之间的所有每周计数。那是2天2小时的窗口。在我的问题中,终点始终是星期三中午,但起点有所不同。
答案 0 :(得分:1)
library(lubridate)
library(tidyverse)
df1 <- data.frame(timestamp = structure(c(1540505400, 1539802080, 1538778660, 1538417640, 1538691660,
1538790780, 1538705100, 1539614520, 1539893280, 1539455520, 1540343580,
1540178220, 1538628960, 1539533280, 1539572700, 1538823480, 1538967480,
1538468400, 1540425600, 1539809880), class = c("POSIXct", "POSIXt"
), tzone = ""))
首先突破日期和时间部分:
df1$day <- weekdays(df1$timestamp)
df1$hour <- hour(df1$timestamp)
然后过滤到我们的三天,然后排除星期一/星期三的开始和结束时间:
df1 <- df1 %>% filter(day %in% c("Monday", "Tuesday", "Wednesday")) %>%
filter(!(day == "Monday" & hour < 10)) %>%
filter(!(day == "Wednesday" & hour > 12))
df1$week <- week(df1$timestamp)
然后使用week
作为您的组:
df1 %>% group_by(week) %>% summarize(count = n())
# A tibble: 3 x 2
week count
<int> <int>
1 40 2
2 42 1
3 43 1
答案 1 :(得分:0)
一种方法似乎是在每一行记录下一个“下一个星期三中午”,然后对它们进行计数。
library(lubridate); library(dplyr)
times_to_test <- data.frame(times = seq.POSIXt(from = ymd_h(2018102400),
to = ymd_h(2018110123), by = "hour"))
times_to_test %>%
# For checking, helps to see which days are wednesdays
mutate(weekday = wday(times, label = T)) %>%
# Wednesday noon is 3.5 days (84 hours) into the week
mutate(next_Wed_noon = floor_date(times + dhours(84), "1 week") +
dhours(84)) %>%
count(next_Wed_noon)
# A tibble: 3 x 2
next_Wed_noon n
<dttm> <int>
1 2018-10-24 12:00:00 12
2 2018-10-31 12:00:00 168
3 2018-11-07 12:00:00 36