Group_by数据帧中的2天

时间:2017-09-08 03:30:20

标签: r dataframe group-by

在R中,我有这个数据帧(flights_48)。

我想先将group_by行分组,以便每个" group"包含48小时(2天)的期限。我认为第一组包含来自2013-01-01~01-03等的数据。然后,我想计算每个" group"的total_delay列的总和。为期2天。

目前,我有

flights_48 %>%
  group_by(year,month,day) %>% 
  summarise(tot = sum(total_delay, na.rm = T))

structure(list(year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2013L
), month = c(1L, 1L, 1L, 1L, 1L, 1L), day = c(1L, 1L, 1L, 1L, 
1L, 1L), dep_time = c(517L, 533L, 542L, 544L, 554L, 554L), sched_dep_time = c(515L, 
529L, 540L, 545L, 600L, 558L), dep_delay = c(2, 4, 2, -1, -6, 
-4), arr_time = c(830L, 850L, 923L, 1004L, 812L, 740L), sched_arr_time = c(819L, 
830L, 850L, 1022L, 837L, 728L), arr_delay = c(11, 20, 33, -18, 
-25, 12), carrier = c("UA", "UA", "AA", "B6", "DL", "UA"), flight = c(1545L, 
1714L, 1141L, 725L, 461L, 1696L), tailnum = c("N14228", "N24211", 
"N619AA", "N804JB", "N668DN", "N39463"), origin = c("EWR", "LGA", 
"JFK", "JFK", "LGA", "EWR"), dest = c("IAH", "IAH", "MIA", "BQN", 
"ATL", "ORD"), air_time = c(227, 227, 160, 183, 116, 150), distance = c(1400, 
1416, 1089, 1576, 762, 719), hour = c(5, 5, 5, 5, 6, 5), minute = c(15, 
29, 40, 45, 0, 58), time_hour = structure(c(1357016400, 1357016400, 
1357016400, 1357016400, 1357020000, 1357016400), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), total_delay = c(13, 24, 35, -19, -31, 
8)), .Names = c("year", "month", "day", "dep_time", "sched_dep_time", 
"dep_delay", "arr_time", "sched_arr_time", "arr_delay", "carrier", 
"flight", "tailnum", "origin", "dest", "air_time", "distance", 
"hour", "minute", "time_hour", "total_delay"), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

enter image description here

1 个答案:

答案 0 :(得分:0)

您可以按天的整数除法对天数进行分组,以得到" daygroup"因此,一个月的第1天和第2天是小组,第3天和第4天等等。我做了一些小例子。数据 这假设您仍然希望在月边界处打破,即使这意味着单独一天。如果您不想这样做,您可以先创建一年中的一天。

library(tidyverse)
flights_48 <- tibble(year = 2013, month = rep(7:8, each = 155), day = rep(1:31, each = 5, times = 2), total_delay = rep(c(5,8,10,20), length.out = 310))

flights_48 %>% mutate(daygroup = (day-1)%/%2) %>% group_by(year, month,daygroup) %>% 

总结(tot = sum(total_delay,na.rm = T))%&gt;%as.data.frame()

如果你不打算在午夜休息,你也可以在time_hour列上做一个类似的事情(假设它在POSIX中)