通过R中的时间戳进行算术运算

时间:2017-05-30 15:09:32

标签: r dataframe timestamp data.table time-series

我有两个数据帧(df1,df2),包含大致相同时间段但不同时间戳的一些测量值。 df1具有每小时数据,df2具有每小时2-3次测量的数据。我想:

  1. 将df2的小时平均值与df中的每小时值进行比较,即每个数据框每小时一个值

  2. 在df2(df2 $ hrly)中创建一个新元素,其值等于df2中每个时间戳的df1的每小时值,即每小时2-3个值(取决于时间的数量 - df2中的那个小时的邮票)

  3. subsetfilter并不适用于这种情况 - 我不想使用循环。我正在考虑使用strftimeaggregate - 有更好的方法吗?我正在学习data.table包 - 也许,有更快/更方便的方法?

    这是df1和df2的样子:

    > glimpse(df1)
    Observations: 7,770
    Variables: 7
    $ lat      <dbl> 30.46198, 30.46198, 30.46198, 30.46198, 30.46198, 30....
    $ lon      <dbl> -91.17922, -91.17922, -91.17922, -91.17922, -91.17922...
    $ date_gmt <chr> "2016-01-01", "2016-01-01", "2016-01-01", "2016-01-01...
    $ time_gmt <chr> "06:00", "07:00", "08:00", "09:00", "10:00", "11:00",...
    $ dust     <dbl> 10.7, 8.0, 8.3, 11.1, 9.1, 10.5, 9.7, 13.5, 10.5, 10....
    $ state    <chr> "Louisiana", "Louisiana", "Louisiana", "Louisiana", "...
    $ tme      <dttm> 2016-01-01 06:00:00, 2016-01-01 07:00:00, 2016-01-01...
    

    df1$tmePOSIxct个对象(tz = "GMT"

    > glimpse(df2)
    Observations: 5,000
    Variables: 9
    $ dp1        <dbl> 0.96, 0.97, 0.98, 0.99, 0.99, 0.99, 0.99, 0.99, 0.9...
    $ dp2        <dbl> 1.51, 1.53, 1.55, 1.56, 1.56, 1.56, 1.56, 1.56, 1.5...
    $ hz         <dbl> 54.13, 54.55, 54.91, 55.03, 54.98, 55.00, 55.13, 55...
    $ rh         <dbl> 68.15, 68.56, 69.84, 68.32, 69.62, 71.14, 70.42, 70...
    $ degc       <dbl> 82.88, 82.33, 82.26, 82.62, 82.20, 81.60, 82.05, 81...
    $ cfm        <dbl> 3993, 3990, 3989, 3928, 3967, 4045, 4002, 3979, 403...
    $ dust       <dbl> 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.0...
    $ time_stamp <dttm> 2016-06-01 17:48:10, 2016-06-01 18:08:12, 2016-06-...
    $ dur        <dbl> 0.0000000, 0.3338889, 0.6677778, 1.0013889, 1.33555...
    

    df2$time_stampPOSIxct对象(tz = "EST"

1 个答案:

答案 0 :(得分:1)

由于我没有测试数据,这是我能做的最好的。希望它有效。

我假设您要比较尘埃变量(数据帧中只有常见变量)。我也假设比较意味着你只想看三角洲。

<强>步骤:

  1. 确保您的时区相同
  2. 将时间戳转换为每小时数据
  3. 按小时计算变量/ s的平均值
  4. 根据时间戳合并
  5. 计算比较的增量
  6. <强> TESTDATA:

    library(data.table)
    df1<-data.table(tme=seq.POSIXt(as.POSIXct("2016-01-01 00:00",tz="GMT"),by=3600, length.out = 100),dust=rnorm(100))
    df2<-data.table(matrix(rnorm(1000*8),1000,8))
    setnames(df2, c("dp1","dp2", "hz","rh","degc", "cfm", "dust","dur"))
    df2[,time_stamp:=seq.POSIXt(as.POSIXct("2016-01-01 00:00",tz="EST"),by=360, length.out = 1000)]
    
    dplyr::glimpse(df1)
    dplyr::glimpse(df2)
    

    <强>代码:

    #first snippet
    attr(df2$time_stamp,"tzone")<-"GMT" #make same timezone
    df2[, tme:=lubridate::round_date(time_stamp, unit = "hours")] #make hourly timestamps
    df3<-df2[, mean(dust), by=c("tme")] #group by tme I am assuming you want to compare the only common variable dust
    setnames(df3, c("tme","dustmean"))
    df_compare<-merge(df1, df3, by="tme", all=T) #this will include all observations from both data.tables
    df_compare[,delta_dust:=dust-dustmean] #is that what you want as comparison?
    plot(df_compare$delta_dust)
    

    <强>代码2: 对于所有带有EST时间和round_date的变量(列)。

    #second snippet
    attr(df1$tme,"tzone")<-"EST" #make same timezone
    df2[, tme:=lubridate::round_date(time_stamp, unit="hours")] #make hourly timestamps
    cols2mean<-colnames(df2)
    cols2mean<-cols2mean[!(cols2mean %in% c("tme", "time_stamp"))]
    df3<-df2[, lapply(.SD, mean), by=c("tme"), .SDcols=cols2mean] #all variables except tme and time_stamp
    df_compare<-merge(df1, df3, by="tme", all=T) #this will include all observations from both data.tables
    df_compare[,delta_dust:=dust.x-dust.y] #one example
    plot(df_compare$delta_dust)