Sum between two weeks interval

时间:2018-02-01 18:21:06

标签: r

Suppose I have a daily rain data.frame like this:

df.meteoro = data.frame(Dates = seq(as.Date("2017/1/19"), as.Date("2018/1/18"), "days"),
                     rain = rnorm(length(seq(as.Date("2017/1/19"), as.Date("2018/1/18"), "days"))))

I'm trying to sum the accumulated rain between a 14 days interval with this code:

library(tidyverse)
library(lubridate)

df.rain <- df.meteoro %>% 
  mutate(TwoWeeks = round_date(df.meteoro$data, "14 days")) %>%
  group_by(TwoWeeks) %>%
  summarise(sum_rain = sum(rain))

The problem is that it isn't starting on 2017-01-19 but on 2017-01-15 and I was expecting my output dates to be:

"2017-02-02" "2017-02-16" "2017-03-02" "2017-03-16" "2017-03-30" "2017-04-13"
"2017-04-27" "2017-05-11" "2017-05-25" "2017-06-08" "2017-06-22" "2017-07-06" "2017-07-20"
"2017-08-03" "2017-08-17" "2017-08-31" "2017-09-14" "2017-09-28" "2017-10-12" "2017-10-26"
"2017-11-09" "2017-11-23" "2017-12-07" "2017-12-21" "2018-01-04" "2018-01-18"

TL;DR I have a year long daily rain data.frame and want to sum the accumulate rain for the dates above.

Please help.

4 个答案:

答案 0 :(得分:1)

以您显示的方式使用round_date不会像您预期的那样为您提供14天的时间段。我在此解决方案中采用了不同的方法,并在您的第一个和最后一个日期之间生成了一系列日期,并将这些日期分组为14天,然后将日期加入到您的观察中。

startdate = min(df.meteoro$Dates)
enddate = max(df.meteoro$Dates)
dateseq = 
  data.frame(Dates = seq.Date(startdate, enddate, by = 1)) %>%
  mutate(group = as.numeric(Dates - startdate) %/% 14) %>%
  group_by(group) %>%
  mutate(starts = min(Dates))


df.rain <- df.meteoro %>% 
  right_join(dateseq) %>%
  group_by(starts) %>%
  summarise(sum_rain = sum(rain))

head(df.rain)

> head(df.rain)
# A tibble: 6 x 2
  starts     sum_rain
  <date>        <dbl>
1 2017-01-19    6.09 
2 2017-02-02    5.55 
3 2017-02-16   -3.40 
4 2017-03-02    2.55 
5 2017-03-16   -0.12
6 2017-03-30    8.95

使用右连接到日期序列是为了确保如果缺少跨越整个时间段的观察日,您仍然可以在结果中列出该时间段(尽管在您的情况下,您有完整的一年无论如何都是。)

答案 1 :(得分:0)

ALTER SEQUENCE notes_id_seq RESTART WITH **VALUE** INCREMENT BY 1; 轮到最近的round_date倍数(此处为14天),因为某个时代(可能是1970-01-01 00:00:00的Unix时代),它没有&#39 ; t符合你的目的。

要获得所需内容,您可以执行以下操作:

unit

这会将df.rain = df.meteoro %>% mutate(days_since_start = as.numeric(Dates - as.Date("2017/1/18")), TwoWeeks = as.Date("2017/1/18") + 14*ceiling(days_since_start/14)) %>% group_by(TwoWeeks) %>% summarise(sum_rain = sum(rain)) 计算为2017/1/18以来的天数,然后手动舍入到下一个两周的倍数。

答案 2 :(得分:0)

假设您要从您指定的日期舍入到最接近的日期,我猜以下内容将起作用

targetDates<-seq(ymd("2017-02-02"),ymd("2018-01-18"),by='14 days')
df.meteoro$Dates=targetDates[sapply(df.meteoro$Dates,function(x) which.min(abs(interval(targetDates,x))))]
sum_rain=ddply(df.meteoro,.(Dates),summarize,sum_rain=sum(rain,na.rm=T))

正如您所看到的,并非所有日期都具有相同数量的观察结果。日期&#34; 2017-02-02&#34;例如,&#34; 2017-01-19&#34;之间有所有记录。直到&#34; 2017-02-09&#34;,这是22条记录。来自&#34; 2017-02-10&#34;日期四舍五入到&#34; 2017-02-16&#34;等

答案 3 :(得分:0)

这可能是作弊,但假设每一行/观察是一个单独的日子,那么为什么不按每14行分组并加总。

# Assign interval groups, each 14 rows
df.meteoro$my_group <-rep(1:100, each=14, length.out=nrow(df.meteoro))

# Grab Interval Names
my_interval_names <- df.meteoro %>%
  select(-rain) %>% 
  group_by(my_group) %>% 
  slice(1)

# Summarise
df.meteoro %>% 
  group_by(my_group) %>% 
  summarise(rain = sum(rain)) %>% 
  left_join(., my_interval_names)

#> Joining, by = "my_group"
#> # A tibble: 27 x 3
#>    my_group   rain Dates     
#>       <int>  <dbl> <date>    
#>  1        1  3.86  2017-01-19
#>  2        2 -0.581 2017-02-02
#>  3        3 -0.876 2017-02-16
#>  4        4  1.80  2017-03-02
#>  5        5  3.79  2017-03-16
#>  6        6 -3.50  2017-03-30
#>  7        7  5.31  2017-04-13
#>  8        8  2.57  2017-04-27
#>  9        9 -1.33  2017-05-11
#> 10       10  5.41  2017-05-25
#> # ... with 17 more rows

reprex package(v0.2.0)创建于2018-03-01。