I have a dataset where there is a record of the rainfalls since 2003. Another dataset contains the information of sampling dates since 2003 until now. I want to sum the amount of rain between the sampling dates (see the object called date.per.year
).
I found this but I want to use a vector of values (c1 =sum(rain in interval [X, Y[, c2 =sum(rain in interval [Y, Z[, c3 =sum(rain in interval [Z, A[, etc.)
date.per.year = structure(c(12110, 12460, 12815, 13196, 13564.5, 13930, 14321,
14652, 15028, 15408, 15792, 16106), .Names = c("2003", "2004",
"2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012",
"2013", "2014"))
Imagine that the Date and rain data frame is this:
df = data.frame(Dates = seq(as.Date("2003/1/1"),
as.Date("2015/1/1"), "days"),
rain = rnorm(length(seq(as.Date("2003/1/1"), as.Date("2015/1/1"), "days"))))
I also tried this, but it's not creating bins that are usable:
## create corresponding intervals
splits <- cut(date.per.year, median, breaks=date.per.year)
Warning message:
In split.default(df$rain, f = splits) :
data length is not a multiple of split variable
## split df$rain into intervals and sum them
lapply(split(df$rain, f=splits), sum)
Or even this:
library(data.table)
DT <- data.table(df)
setkey(DT, rain, Dates)
DT[, sumSum := DT[ .(.BY[[1]], .d+(-5:-1) )][, sum(sum, na.rm=TRUE)] , by=list(date.per.year, .d=Dates)]
Error in `[.data.table`(DT, , `:=`(sumSum, DT[.(.BY[[1]], .d + (-5:-1))][, : The items in the 'by' or 'keyby' list are length (12,4384). Each must be same length as rows in x or number of rows returned by i (4384).
DT
An illustration of what I want to do is below. Imagine that the red lines are the dates that are creating the ranges I want to sum (which is the date.per.year
object). In the end, I should have 11 values of the sum of the different ranges. Is it possible to do this?
答案 0 :(得分:1)
您需要提供原点以将这些数字转换为Date Time对象。否则会收到错误,告诉您这样做。之后,基于此变量的切割很简单。
cuts <- as.Date(date.per.year, origin = as.Date("1970/1/1"))
binned <- cut(df$Dates,
breaks = cuts)
N.B。断点是包容性的,因此对于第一个和最后几个值,df$Dates
将为NA
。
您会注意到,例如,此日期时间因素的唯一级别为
unique(binned)
[1] <NA> 2003-02-27 2004-02-12 2005-02-01 2006-02-17
[6] 2007-02-20 2008-02-21 2009-03-18 2010-02-12 2011-02-23
[11] 2012-03-09 2013-03-28
11 Levels: 2003-02-27 2004-02-12 2005-02-01 ... 2013-03-28
根据分组总和,有成千上万的Stack Overflow帖子可以帮助您实现这一目标。例如,你可以
df %>% mutate(binned = cut(Dates, breaks =cuts)) %>%
group_by(binned) %>% summarize(sum(rain))
# A tibble: 12 x 2
binned sum(rain)
<fctr> <dbl>
1 2003-02-27 7.996658
2 2004-02-12 -11.950646
3 2005-02-01 30.443479
4 2006-02-17 19.687989
5 2007-02-20 -2.088648
6 2008-02-21 33.837560
7 2009-03-18 -5.039810
8 2010-02-12 -5.235960
9 2011-02-23 -9.806273
10 2012-03-09 -3.887545
11 2013-03-28 30.446548
12 NA 36.634249
请记住第12行中的NA
代表2003-02-27之前和2013-03-28之后的总雨量。