在总结转移时间后合并两个每日时间序列

时间:2017-09-11 21:25:15

标签: r date timestamp time-series aggregate

我有一个日期时间变量索引的度量(例如太阳辐射),每小时时间戳。我想要做的是将一年中每一天的测量值相加,并将其与日常规模的另一个数据源相匹配(让我们说平均室外温度)。

尽管如此,第二天的数据已经在第二天上午8:00到上午8:00进行了 。我知道如何按标准日汇总我的第一个变量,但我需要从8到8进行总结,以便匹配两个测量值。

我的数据示例

set.seed(1L) # to create reproducible data
hourly = data.frame(datetime = seq(from = lubridate::ymd_hm("2017-01-01 01:00"), 
                                   length.out = 168, by = "hour"),
                    value = rpois(168, 10))
daily = data.frame(datetime = seq(from=as.Date("2017-01-01"), length.out = 31, by="day"),
                   value=rnorm(31))

3 个答案:

答案 0 :(得分:1)

你可以使用cut来做,例如:

library(lubridate)
library(dplyr)
brk = seq(ymd_hm(paste(as.Date(min(hourly$datetime) - days(1)), "08:00"), tz= "UTC"), ymd_hm(paste(as.Date(max(hourly$datetime)+ days(1)), "08:00"), tz= "UTC"), by = "24 hours")
hourly$cut <- ymd_hms(cut.POSIXt(hourly$datetime, breaks = brk))
hourly2 <- hourly %>% group_by(cut) %>% summarize(value = sum(value)) 
hourly2$cut <- as.Date(hourly2$cut)
names(hourly2) <- names(daily)
comb <- rbind(hourly2, daily) %>% group_by(datetime) %>% summarize(value = sum(value))

     datetime       value
       <date>       <dbl>
 1 2016-12-31  52.0000000
 2 2017-01-01 241.5612137
 3 2017-01-02 244.3689032
 4 2017-01-03 271.3156334
 5 2017-01-04 253.8221333
 6 2017-01-05 238.5790170
 7 2017-01-06 220.7118064
 8 2017-01-07 167.5018586
 9 2017-01-08  -0.2962494
10 2017-01-09   0.4126310
 ... with 22 more rows

答案 1 :(得分:1)

使用dplyr并通过减去8小时来翻译日期:

hourly %>% mutate(datetime = as_date(datetime - 8 * hours())) %>%
  rbind(daily) %>%
  group_by(datetime)  %>%
  summarize_all(sum) %>%
  ungroup%>%
  arrange(datetime) 

<强>结果

 A tibble: 32 x 2
     datetime       value
       <date>       <dbl>
 1 2016-12-31  70.0000000
 2 2017-01-01 218.6726454
 3 2017-01-02 244.3821258
 4 2017-01-03 257.7136326
 5 2017-01-04 220.4788443
 6 2017-01-05 230.3729744
 7 2017-01-06 248.5082639
 8 2017-01-07 176.5511818
 9 2017-01-08  -0.8307824
10 2017-01-09  -0.6343781
# ... with 22 more rows

答案 2 :(得分:1)

my comment扩展为答案,值得注意的是,OP强调了从第二天上午8:00到上午8:00汇总的字样

映射未对齐24小时的日期到日期

如果24小时 与午夜不对齐,即从00:00延伸到24:00,但在白天的某个时间开始和结束,它是不明确的哪个日期与该期间相关联。

我们可以采取

  1. 期间开始的日期,
  2. 期间结束的日期,或
  3. 包含该期间大部分时间的日期。
  4. 只是为了说明不同之处:

    # timestamps: 9 am, 10pm, 7 am next day 
    x <- lubridate::ymd_hm(c("2017-09-12 09:00", "2017-09-12 22:00", "2017-09-13 07:00"))
    x
    
    [1] "2017-09-12 09:00:00 UTC" "2017-09-12 22:00:00 UTC" "2017-09-13 07:00:00 UTC"
    
    # map timestamps to date on which period starts by shifting back by 8 hours
    x + lubridate::hours(-8L)
    
    [1] "2017-09-12 01:00:00 UTC" "2017-09-12 14:00:00 UTC" "2017-09-12 23:00:00 UTC"
    
    # map timestamps to date on which period ends by advancing by 16 hours
    x + lubridate::hours(16L)
    
    [1] "2017-09-13 01:00:00 UTC" "2017-09-13 14:00:00 UTC" "2017-09-13 23:00:00 UTC"
    

    由于没有其他信息,我们假设daily数据已映射到期间开始的那一天。

    聚合和合并

    使用分组,汇总和合并data.table

    library(data.table)
    # aggregate data by shifted timestamp
    setDT(hourly)[, .(sum.value = sum(value)), 
                  by = .(date = as.Date(datetime + lubridate::hours(-8L)))]
    
             date sum.value
    1: 2016-12-31        68
    2: 2017-01-01       232
    3: 2017-01-02       222
    4: 2017-01-03       227
    5: 2017-01-04       228
    6: 2017-01-05       231
    7: 2017-01-06       260
    8: 2017-01-07       144
    

    请注意,用于分组和聚合的新date列是在by参数中动态创建的(我更喜欢的原因之一{{ 1}})

    现在,需要加入data.table数据。通过链接,这可以合并为一个语句:

    daily
    setDT(hourly)[, .(sum.value = sum(value)), 
                  by = .(date = as.Date(datetime + lubridate::hours(-8L)))][
                    setDT(daily), on = .(date = datetime), nomatch = 0L]
    

    参数 date sum.value value 1: 2017-01-01 232 -0.5080862 2: 2017-01-02 222 0.5236206 3: 2017-01-03 227 1.0177542 4: 2017-01-04 228 -0.2511646 5: 2017-01-05 231 -1.4299934 6: 2017-01-06 260 1.7091210 7: 2017-01-07 144 1.4350696 表示我们需要内部联接