计算R data.table中前三行的总和(按网格方)

时间:2015-09-18 10:36:21

标签: r filter data.table zoo

我想计算每个网格方块过去三天的降雨量,并将其添加为我的data.table中的新列。为了清楚起见,我想总结一下目前和上一天的降雨量,每个气象网格广场

library ( zoo )
library (data.table)


# making the data.table
rain           <- c(NA, NA, NA, 0, 0, 5, 1, 0, 3, 10)  # rainfall values to work with
square         <- c(1,1,1,1,1,1,1,1,1,2)               # the geographic grid square for the rainfall measurement
desired_result <- c(NA, NA, NA, NA, NA, 5, 6, 6, 4, NA )  # this is the result I'm looking for (the last NA as we are now on to the first day of the second grid square)
weather <- data.table(rain, square, desired_result)  # making the data.table

我尝试回答:这条线曾经有效,但不再有用

weather[, rain_3 := filter(rain, rep(1, 2), sides = 1), by = list(square)]  

所以我在这里尝试另一种方法:

# this next line gets the numbers right, but sums the following values, not the preceeding ones. 
weather$rain_3 <- rollapply(zoo(weather$rain), list(seq(-2,0)), sum)

# here I add in the by weather$ square, but still no success
weather$rain_3 <- rollapply(zoo(weather$rain), list(seq(-2,0)), sum, by= list(weather$square))

我非常感谢您提出的任何见解或建议。

非常感谢!

5 个答案:

答案 0 :(得分:20)

这是使用最新data.table版本(v 1.9.6 +)的快速有效的解决方案

weather[, rain_3 := Reduce(`+`, shift(rain, 0:2)), by = square]
weather
#     rain square desired_result rain_3
#  1:   NA      1             NA     NA
#  2:   NA      1             NA     NA
#  3:   NA      1             NA     NA
#  4:    0      1             NA     NA
#  5:    0      1             NA     NA
#  6:    5      1              5      5
#  7:    1      1              6      6
#  8:    0      1              6      6
#  9:    3      1              4      4
# 10:   10      2             NA     NA

这里的基本想法是shift两次rain列,然后对行进行总结。

答案 1 :(得分:2)

weather[, rain_3 := filter(rain, rep(1, 3), sides = 1), by = list(square)]  
#Error in filter(rain, rep(1, 3), sides = 1) : 
#  'filter' is longer than time series
weather[, rain_3 := if(.N > 2) filter(rain, rep(1, 3), sides = 1) else NA_real_, 
        by = square] 
#    rain square desired_result rain_3
# 1:   NA      1             NA     NA
# 2:   NA      1             NA     NA
# 3:   NA      1             NA     NA
# 4:    0      1             NA     NA
# 5:    0      1             NA     NA
# 6:    5      1              5      5
# 7:    1      1              6      6
# 8:    0      1              6      6
# 9:    3      1              4      4
#10:   10      2             NA     NA

请注意不加载dplyr,因为它会屏蔽filter。如果您需要dplyr,可以明确地致电stats::filter

答案 2 :(得分:2)

你自己几乎得到了答案。 rollsum(或您的情况下为rollapply)为您提供长度为N-2的向量,因此您只需要用NA填充所需的单元格。它可以像这样简单地完成:roll<-c(NA,NA,rollsum(yourvector,k=3))

我是这样做的。我正在使用来自{RcppRoll}软件包的roll_sum,因为它更快,并且更容易处理NAs。 data.table中的简单by参数允许您按平方对结果进行分组。

library(RcppRoll)
weather[,rain_3:=if(.N>2){c(NA,NA,roll_sum(rain,n=3))}else{NA},by=square]
weather

    rain square desired_result rain_3
 1:   NA      1             NA     NA
 2:   NA      1             NA     NA
 3:   NA      1             NA     NA
 4:    0      1             NA     NA
 5:    0      1             NA     NA
 6:    5      1              5      5
 7:    1      1              6      6
 8:    0      1              6      6
 9:    3      1              4      4
10:   10      2             NA     NA

答案 3 :(得分:2)

晚了聚会,但是data.table软件包的最新版本(对我来说是1.12.8)具有frollsum函数,它将比以前更干净地完成此操作(但非常有效)答案:

library (data.table)

# making the data.table
rain           <- c(NA, NA, NA, 0, 0, 5, 1, 0, 3, 10)  # rainfall values to work with
square         <- c(1,1,1,1,1,1,1,1,1,2)               # the geographic grid square for the rainfall measurement
desired_result <- c(NA, NA, NA, NA, NA, 5, 6, 6, 4, NA )  # this is the result I'm looking for (the last NA as we are now on to the first day of the second grid square)
weather <- data.table(rain, square, desired_result)  # making the data.table

# using `frollsum`
weather[, rain3 := frollsum(rain, n = 3), by = square][]
#>     rain square desired_result rain3
#>  1:   NA      1             NA    NA
#>  2:   NA      1             NA    NA
#>  3:   NA      1             NA    NA
#>  4:    0      1             NA    NA
#>  5:    0      1             NA    NA
#>  6:    5      1              5     5
#>  7:    1      1              6     6
#>  8:    0      1              6     6
#>  9:    3      1              4     4
#> 10:   10      2             NA    NA

reprex package(v0.3.0)于2020-07-09创建

答案 4 :(得分:0)

dplyr解决方案:

library(dplyr)
weather %>% 
  group_by(square) %>% 
  mutate(rain_3 = rain + lag(rain) + lag(rain, n = 2L))

结果:

Source: local data table [10 x 4]

    rain square desired_result rain_3
   (dbl)  (dbl)          (dbl) (dbl)
1     NA      1             NA    NA
2     NA      1             NA    NA
3     NA      1             NA    NA
4      0      1             NA    NA
5      0      1             NA    NA
6      5      1              5     5
7      1      1              6     6
8      0      1              6     6
9      3      1              4     4
10    10      2             NA    NA

如果要将rain3分配给数据集,可以使用管道中%<>%的{​​{1}}符号:

maggritr