如何在R中汇总部分星期

时间:2018-10-31 03:33:47

标签: r count aggregate partial

有很多很棒的方法可以将基于时间戳的数据聚合成几周。但是我真的很艰难,总共花了不到一周的时间。我已经在Google上搜索了几天,绞尽脑汁,发现了一些非常难用和丑陋的方法来编程循环。使用tidyverse必须有一个优雅的解决方案。

让我们说我以时间戳记格式记录了鸟类的目击记录。两列:时间戳记,鸟名

这样按周汇总计数很容易
birds_per_week<- data %>%  group_by(week = cut(timestamp, "week", start.on.monday = TRUE)) %>%   summarise(n())

但是我真的很艰难,我想知道不完整的一周的计数。假设今天是星期一上午10点,我想知道星期一星期一至星期三中午之间的所有每周计数。那是2天2小时的窗口。在我的问题中,终点始终是星期三中午,但起点有所不同。

2 个答案:

答案 0 :(得分:1)

library(lubridate)
library(tidyverse)

df1 <- data.frame(timestamp = structure(c(1540505400, 1539802080, 1538778660, 1538417640, 1538691660, 
1538790780, 1538705100, 1539614520, 1539893280, 1539455520, 1540343580, 
1540178220, 1538628960, 1539533280, 1539572700, 1538823480, 1538967480, 
1538468400, 1540425600, 1539809880), class = c("POSIXct", "POSIXt"
), tzone = ""))

首先突破日期和时间部分:

df1$day <- weekdays(df1$timestamp)
df1$hour <- hour(df1$timestamp)

然后过滤到我们的三天,然后排除星期一/星期三的开始和结束时间:

df1 <- df1 %>% filter(day %in% c("Monday", "Tuesday", "Wednesday")) %>% 
  filter(!(day == "Monday" & hour < 10)) %>% 
  filter(!(day == "Wednesday" & hour > 12))

df1$week <- week(df1$timestamp)

然后使用week作为您的组:

df1 %>% group_by(week) %>% summarize(count = n())

# A tibble: 3 x 2
   week count
  <int> <int>
1    40     2
2    42     1
3    43     1

答案 1 :(得分:0)

一种方法似乎是在每一行记录下一个“下一个星期三中午”,然后对它们进行计数。

library(lubridate); library(dplyr)

times_to_test <- data.frame(times = seq.POSIXt(from = ymd_h(2018102400),
                            to   = ymd_h(2018110123), by = "hour"))

times_to_test %>%
  # For checking, helps to see which days are wednesdays
  mutate(weekday = wday(times, label = T)) %>%
  # Wednesday noon is 3.5 days (84 hours) into the week
  mutate(next_Wed_noon = floor_date(times + dhours(84), "1 week") + 
           dhours(84)) %>%
  count(next_Wed_noon)

# A tibble: 3 x 2
  next_Wed_noon           n
  <dttm>              <int>
1 2018-10-24 12:00:00    12
2 2018-10-31 12:00:00   168
3 2018-11-07 12:00:00    36