group_by函数不能与另一个group_by一起使用

时间:2020-01-30 03:19:20

标签: r group-by dplyr

我以以下格式设置了数据:

STATION     CODE  DATE     HOUR hr_rain
SHIVAMOGGA  163 06/09/18    00   1.0
SHIVAMOGGA  163 06/09/18    04   1.0
SHIVAMOGGA  163 06/09/18    05   NA
SHIVAMOGGA  163 06/09/18    06   1.5
SHIVAMOGGA  163 06/09/18    07   2.5
SHIVAMOGGA  163 06/09/18    08   NA
SHIVAMOGGA  163 06/09/18    09   0.0
SHIVAMOGGA  163 06/09/18    10   0.5
SHIVAMOGGA  163 06/09/18    11   0.5
SHIVAMOGGA  163 06/09/18    12   NA
SHIVAMOGGA  163 06/09/18    13   NA
SHIVAMOGGA  163 06/09/18    14   0.5
SHIVAMOGGA  163 06/09/18    15   0.5
SHIVAMOGGA  163 06/09/18    16   0.5
SHIVAMOGGA  163 06/09/18    17   0.5
SHIVAMOGGA  163 06/09/18    18   0.5
SHIVAMOGGA  163 06/09/18    19   0.5
SHIVAMOGGA  163 06/10/19    03   0.5
SHIVAMOGGA  163 06/10/19    05   NA
SHIVAMOGGA  163 06/10/19    06   NA
SHIVAMOGGA  163 06/10/19    07   NA
SHIVAMOGGA  163 06/10/19    08   0.5
SHIVAMOGGA  163 06/10/19    09   0.0
SHIVAMOGGA  163 06/10/19    10   0.0

此处,降雨参数为小时累计格式。我对每小时的降雨量感兴趣。该测量每天从09小时开始,有时会丢失一些观测值,因此我尝试填充NA值(3个或多个连续的NA将保持不变,并且连续的NA的数量少于2个将被替换,并且在8小时的NA给出了先前的值)分组为09 HOUR。

df1 <- df %>% 
  group_by(STATION, CODE, gr = cumsum(HOUR == '09')) %>% 
  mutate(hr_rain = na.approx(hr_Rain, rule = 2, maxgap = 2, na.rm = FALSE))

再次计算每小时的降雨率,我尝试将df1分组为:

hourly_df <- df1 %>% 
  group_by(STATION, CODE , grp = cumsum(HOUR == '09')) %>% 
  mutate(RAINFALL = hr_rain - lag(hr_rain, default = 0))

但是它不起作用。它创建第一个组,然后第二个组继续直到数据帧结束。结果是这样的:

STATION     CODE  DATE     HOUR hr_rain  NUM_NA  gp  grp  RAINFALL
SHIVAMOGGA  163 06/09/18    00   1.0       2      0   0     1
SHIVAMOGGA  163 06/09/18    04   1.0       2      0   0     0
SHIVAMOGGA  163 06/09/18    05   1.25      1      0   0     0.25
SHIVAMOGGA  163 06/09/18    06   1.5       1      0   0     0.25
SHIVAMOGGA  163 06/09/18    07   2.5       1      0   0     1
SHIVAMOGGA  163 06/09/18    08   2.5       1      0   0     0
SHIVAMOGGA  163 06/09/18    09   0.0       1      1   1     -2.5
SHIVAMOGGA  163 06/09/18    10   0.5       2      1   1     0.5
SHIVAMOGGA  163 06/09/18    11   0.5       2      1   1     0
SHIVAMOGGA  163 06/09/18    12   0.5       2      1   1     0
SHIVAMOGGA  163 06/09/18    13   0.5       2      1   1     0
SHIVAMOGGA  163 06/09/18    14   0.5       7      1   1     0
SHIVAMOGGA  163 06/09/18    15   0.5       7      1   1     0
SHIVAMOGGA  163 06/09/18    16   0.5       7      1   1     0
SHIVAMOGGA  163 06/09/18    17   0.5       7      1   1     0
SHIVAMOGGA  163 06/09/18    18   0.5       7      1   1     0
SHIVAMOGGA  163 06/09/18    19   0.5       7      1   1     0
SHIVAMOGGA  163 06/10/19    03   0.5       7      1   1     0
SHIVAMOGGA  163 06/10/19    05   NA        3      1   1     NA
SHIVAMOGGA  163 06/10/19    06   NA        3      1   1     NA
SHIVAMOGGA  163 06/10/19    07   NA        3      1   1     NA
SHIVAMOGGA  163 06/10/19    08   0.5       1      1   1     0.5
SHIVAMOGGA  163 06/10/19    09   0.0       2      2   1     -0.5
SHIVAMOGGA  163 06/10/19    10   0.0       2      2   1     0

使用9小时,我得到的是负值,我想从该字段的hr_rain值开始(这就是为什么我试图按09小时创建另一个分组的原因)。 预先感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

由于两个组相同,因此无需进行不同的计算,因此可以将它们组合在一起并一起计算hr_rainRAINFALL

library(dplyr)

df %>% 
  group_by(STATION, CODE, gr = cumsum(HOUR == '09')) %>% 
  mutate(hr_rain = zoo::na.approx(hr_rain, rule = 2, maxgap = 2, na.rm = FALSE), 
         RAINFALL = hr_rain - lag(hr_rain, default = 0)) 

数据

df <- structure(list(STATION = c("SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", 
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", 
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", 
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", 
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", 
"SHIVAMOGGA"), CODE = c(163, 163, 163, 163, 163, 163, 163, 163, 
163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 
163, 163, 163), DATE = c("06/09/18", "06/09/18", "06/09/18", 
"06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", 
"06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", 
"06/09/18", "06/09/18", "06/10/19", "06/10/19", "06/10/19", "06/10/19", 
"06/10/19", "06/10/19", "06/10/19"), HOUR = c("00", "04", "05", 
"06", "07", "08", "09", "10", "11", "12", "13", "14", "15", "16", 
"17", "18", "19", "03", "05", "06", "07", "08", "09", "10"), 
hr_rain = c(1, 1, NA, 1.5, 2.5, NA, 0, 0.5, 0.5, NA, NA, 
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, NA, NA, NA, 0.5, 0, 0)), row.names = c(NA, 
-24L), class = "data.frame")
相关问题