在R中将值格式更改为标准的30秒格式

时间:2018-09-11 16:52:15

标签: r dplyr data.table lubridate

我希望将非标准的值格式数据更改(仅当Value更改时读取)格式化为标准的30秒间隔格式。

我所拥有的:df

Timestamp   Value
6/26/2018 0:00:06   10
6/26/2018 0:01:06   15
6/26/2018 0:02:15   20

dput

structure(list(Timestamp = c("6/26/2018 0:00:06", "6/26/2018 0:01:06", 
"6/26/2018 0:02:15"), Value = c(10L, 15L, 20L)), .Names = c("Timestamp", 
"Value"), class = "data.frame", row.names = c(NA, -3L))

我想要的东西 formatted_df

Timestamp   Value
6/26/2018 0:00:30   10
6/26/2018 0:01:00   10
6/26/2018 0:01:30   15
6/26/2018 0:02:00   15
6/26/2018 0:02:30   20

我的尝试

使用lubridatedplyr中的函数,我得到的间隔为30秒的倍数,但是它没有标准化到30秒:

formatted <- df %>% mutate(Timestamp_Date = as.POSIXct(Timestamp, tz = "US/Eastern", usetz = TRUE, format="%m/%d/%Y %H:%M:%S"),
                           rounded_timestamp = ceiling_date(Timestamp_Date, unit = "30 seconds"))

formatted

Timestamp   Value   Timestamp_Date  rounded_timestamp
6/26/2018 0:00:06   10  6/26/2018 0:00:06   6/26/2018 0:00:30
6/26/2018 0:01:06   15  6/26/2018 0:01:06   6/26/2018 0:01:30
6/26/2018 0:02:15   20  6/26/2018 0:02:15   6/26/2018 0:02:30

我认为lubridatedplyr在这里很有用,但我敢打赌data.table可以做到。

1 个答案:

答案 0 :(得分:1)

您可以使用data.table滚动联接。

library(data.table)

#convert df into data.table and Timestamp into POSIX format
setDT(df)[, Timestamp := as.POSIXct(Timestamp, format="%m/%d/%Y %H:%M:%S")]

#create the intervals of 30seconds according to needs
tstmp <- seq(as.POSIXct("2018-06-26 00:00:30", tz=""), 
    as.POSIXct("2018-06-26 00:02:30", tz=""), 
    by="30 sec")

#rolling join between intervals and df
df[.(Timestamp=tstmp), on=.(Timestamp), roll=Inf]

输出:

             Timestamp Value
1: 2018-06-26 00:00:30    10
2: 2018-06-26 00:01:00    10
3: 2018-06-26 00:01:30    15
4: 2018-06-26 00:02:00    15
5: 2018-06-26 00:02:30    20

有关更多信息,请阅读roll中的?data.table自变量