How to calculate the sum of values between a range of time in R?

时间:2016-08-31 18:46:08

标签: r date

I have a dataset where there is a record of the rainfalls since 2003. Another dataset contains the information of sampling dates since 2003 until now. I want to sum the amount of rain between the sampling dates (see the object called date.per.year).

I found this but I want to use a vector of values (c1 =sum(rain in interval [X, Y[, c2 =sum(rain in interval [Y, Z[, c3 =sum(rain in interval [Z, A[, etc.)

date.per.year = structure(c(12110, 12460, 12815, 13196, 13564.5, 13930, 14321, 
                            14652, 15028, 15408, 15792, 16106), .Names = c("2003", "2004", 
                                                                           "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", 
                                                                           "2013", "2014"))

Imagine that the Date and rain data frame is this:

df = data.frame(Dates = seq(as.Date("2003/1/1"), 
                            as.Date("2015/1/1"), "days"), 
                rain = rnorm(length(seq(as.Date("2003/1/1"), as.Date("2015/1/1"), "days"))))

I also tried this, but it's not creating bins that are usable:

## create corresponding intervals
splits <- cut(date.per.year, median, breaks=date.per.year)

Warning message:
In split.default(df$rain, f = splits) :
  data length is not a multiple of split variable


## split df$rain into intervals and sum them
lapply(split(df$rain, f=splits), sum)

Or even this:

library(data.table)
DT <- data.table(df)
setkey(DT, rain, Dates)

DT[, sumSum := DT[ .(.BY[[1]], .d+(-5:-1) )][, sum(sum, na.rm=TRUE)] , by=list(date.per.year, .d=Dates)]
Error in `[.data.table`(DT, , `:=`(sumSum, DT[.(.BY[[1]], .d + (-5:-1))][,  : The items in the 'by' or 'keyby' list are length (12,4384). Each must be same length as rows in x or number of rows returned by i (4384).

DT

An illustration of what I want to do is below. Imagine that the red lines are the dates that are creating the ranges I want to sum (which is the date.per.year object). In the end, I should have 11 values of the sum of the different ranges. Is it possible to do this?

enter image description here

1 个答案:

答案 0 :(得分:1)

您需要提供原点以将这些数字转换为Date Time对象。否则会收到错误,告诉您这样做。之后,基于此变量的切割很简单。

cuts <- as.Date(date.per.year, origin = as.Date("1970/1/1"))
binned <- cut(df$Dates, 
              breaks = cuts)

N.B。断点是包容性的,因此对于第一个和最后几个值,df$Dates将为NA

您会注意到,例如,此日期时间因素的唯一级别为

 unique(binned)
 [1] <NA>       2003-02-27 2004-02-12 2005-02-01 2006-02-17
 [6] 2007-02-20 2008-02-21 2009-03-18 2010-02-12 2011-02-23
[11] 2012-03-09 2013-03-28
11 Levels: 2003-02-27 2004-02-12 2005-02-01 ... 2013-03-28

根据分组总和,有成千上万的Stack Overflow帖子可以帮助您实现这一目标。例如,你可以

  df %>% mutate(binned = cut(Dates, breaks =cuts)) %>% 
    group_by(binned) %>% summarize(sum(rain))

# A tibble: 12 x 2
       binned  sum(rain)
       <fctr>      <dbl>
1  2003-02-27   7.996658
2  2004-02-12 -11.950646
3  2005-02-01  30.443479
4  2006-02-17  19.687989
5  2007-02-20  -2.088648
6  2008-02-21  33.837560
7  2009-03-18  -5.039810
8  2010-02-12  -5.235960
9  2011-02-23  -9.806273
10 2012-03-09  -3.887545
11 2013-03-28  30.446548
12         NA  36.634249

请记住第12行中的NA代表2003-02-27之前和2013-03-28之后的总雨量。