我以以下格式设置了数据:
STATION CODE DATE HOUR hr_rain
SHIVAMOGGA 163 06/09/18 00 1.0
SHIVAMOGGA 163 06/09/18 04 1.0
SHIVAMOGGA 163 06/09/18 05 NA
SHIVAMOGGA 163 06/09/18 06 1.5
SHIVAMOGGA 163 06/09/18 07 2.5
SHIVAMOGGA 163 06/09/18 08 NA
SHIVAMOGGA 163 06/09/18 09 0.0
SHIVAMOGGA 163 06/09/18 10 0.5
SHIVAMOGGA 163 06/09/18 11 0.5
SHIVAMOGGA 163 06/09/18 12 NA
SHIVAMOGGA 163 06/09/18 13 NA
SHIVAMOGGA 163 06/09/18 14 0.5
SHIVAMOGGA 163 06/09/18 15 0.5
SHIVAMOGGA 163 06/09/18 16 0.5
SHIVAMOGGA 163 06/09/18 17 0.5
SHIVAMOGGA 163 06/09/18 18 0.5
SHIVAMOGGA 163 06/09/18 19 0.5
SHIVAMOGGA 163 06/10/19 03 0.5
SHIVAMOGGA 163 06/10/19 05 NA
SHIVAMOGGA 163 06/10/19 06 NA
SHIVAMOGGA 163 06/10/19 07 NA
SHIVAMOGGA 163 06/10/19 08 0.5
SHIVAMOGGA 163 06/10/19 09 0.0
SHIVAMOGGA 163 06/10/19 10 0.0
此处,降雨参数为小时累计格式。我对每小时的降雨量感兴趣。该测量每天从09小时开始,有时会丢失一些观测值,因此我尝试填充NA值(3个或多个连续的NA将保持不变,并且连续的NA的数量少于2个将被替换,并且在8小时的NA给出了先前的值)分组为09 HOUR。
df1 <- df %>%
group_by(STATION, CODE, gr = cumsum(HOUR == '09')) %>%
mutate(hr_rain = na.approx(hr_Rain, rule = 2, maxgap = 2, na.rm = FALSE))
再次计算每小时的降雨率,我尝试将df1
分组为:
hourly_df <- df1 %>%
group_by(STATION, CODE , grp = cumsum(HOUR == '09')) %>%
mutate(RAINFALL = hr_rain - lag(hr_rain, default = 0))
但是它不起作用。它创建第一个组,然后第二个组继续直到数据帧结束。结果是这样的:
STATION CODE DATE HOUR hr_rain NUM_NA gp grp RAINFALL
SHIVAMOGGA 163 06/09/18 00 1.0 2 0 0 1
SHIVAMOGGA 163 06/09/18 04 1.0 2 0 0 0
SHIVAMOGGA 163 06/09/18 05 1.25 1 0 0 0.25
SHIVAMOGGA 163 06/09/18 06 1.5 1 0 0 0.25
SHIVAMOGGA 163 06/09/18 07 2.5 1 0 0 1
SHIVAMOGGA 163 06/09/18 08 2.5 1 0 0 0
SHIVAMOGGA 163 06/09/18 09 0.0 1 1 1 -2.5
SHIVAMOGGA 163 06/09/18 10 0.5 2 1 1 0.5
SHIVAMOGGA 163 06/09/18 11 0.5 2 1 1 0
SHIVAMOGGA 163 06/09/18 12 0.5 2 1 1 0
SHIVAMOGGA 163 06/09/18 13 0.5 2 1 1 0
SHIVAMOGGA 163 06/09/18 14 0.5 7 1 1 0
SHIVAMOGGA 163 06/09/18 15 0.5 7 1 1 0
SHIVAMOGGA 163 06/09/18 16 0.5 7 1 1 0
SHIVAMOGGA 163 06/09/18 17 0.5 7 1 1 0
SHIVAMOGGA 163 06/09/18 18 0.5 7 1 1 0
SHIVAMOGGA 163 06/09/18 19 0.5 7 1 1 0
SHIVAMOGGA 163 06/10/19 03 0.5 7 1 1 0
SHIVAMOGGA 163 06/10/19 05 NA 3 1 1 NA
SHIVAMOGGA 163 06/10/19 06 NA 3 1 1 NA
SHIVAMOGGA 163 06/10/19 07 NA 3 1 1 NA
SHIVAMOGGA 163 06/10/19 08 0.5 1 1 1 0.5
SHIVAMOGGA 163 06/10/19 09 0.0 2 2 1 -0.5
SHIVAMOGGA 163 06/10/19 10 0.0 2 2 1 0
使用9小时,我得到的是负值,我想从该字段的hr_rain值开始(这就是为什么我试图按09小时创建另一个分组的原因)。 预先感谢您的帮助!
答案 0 :(得分:1)
由于两个组相同,因此无需进行不同的计算,因此可以将它们组合在一起并一起计算hr_rain
和RAINFALL
。
library(dplyr)
df %>%
group_by(STATION, CODE, gr = cumsum(HOUR == '09')) %>%
mutate(hr_rain = zoo::na.approx(hr_rain, rule = 2, maxgap = 2, na.rm = FALSE),
RAINFALL = hr_rain - lag(hr_rain, default = 0))
数据
df <- structure(list(STATION = c("SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA",
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA",
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA",
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA",
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA",
"SHIVAMOGGA"), CODE = c(163, 163, 163, 163, 163, 163, 163, 163,
163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163,
163, 163, 163), DATE = c("06/09/18", "06/09/18", "06/09/18",
"06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18",
"06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18",
"06/09/18", "06/09/18", "06/10/19", "06/10/19", "06/10/19", "06/10/19",
"06/10/19", "06/10/19", "06/10/19"), HOUR = c("00", "04", "05",
"06", "07", "08", "09", "10", "11", "12", "13", "14", "15", "16",
"17", "18", "19", "03", "05", "06", "07", "08", "09", "10"),
hr_rain = c(1, 1, NA, 1.5, 2.5, NA, 0, 0.5, 0.5, NA, NA,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, NA, NA, NA, 0.5, 0, 0)), row.names = c(NA,
-24L), class = "data.frame")