我希望将非标准的值格式数据更改(仅当Value
更改时读取)格式化为标准的30秒间隔格式。
我所拥有的:df
:
Timestamp Value
6/26/2018 0:00:06 10
6/26/2018 0:01:06 15
6/26/2018 0:02:15 20
和dput
:
structure(list(Timestamp = c("6/26/2018 0:00:06", "6/26/2018 0:01:06",
"6/26/2018 0:02:15"), Value = c(10L, 15L, 20L)), .Names = c("Timestamp",
"Value"), class = "data.frame", row.names = c(NA, -3L))
我想要的东西 formatted_df
:
Timestamp Value
6/26/2018 0:00:30 10
6/26/2018 0:01:00 10
6/26/2018 0:01:30 15
6/26/2018 0:02:00 15
6/26/2018 0:02:30 20
我的尝试
使用lubridate
和dplyr
中的函数,我得到的间隔为30秒的倍数,但是它没有标准化到30秒:
formatted <- df %>% mutate(Timestamp_Date = as.POSIXct(Timestamp, tz = "US/Eastern", usetz = TRUE, format="%m/%d/%Y %H:%M:%S"),
rounded_timestamp = ceiling_date(Timestamp_Date, unit = "30 seconds"))
与formatted
:
Timestamp Value Timestamp_Date rounded_timestamp
6/26/2018 0:00:06 10 6/26/2018 0:00:06 6/26/2018 0:00:30
6/26/2018 0:01:06 15 6/26/2018 0:01:06 6/26/2018 0:01:30
6/26/2018 0:02:15 20 6/26/2018 0:02:15 6/26/2018 0:02:30
我认为lubridate
和dplyr
在这里很有用,但我敢打赌data.table
可以做到。
答案 0 :(得分:1)
您可以使用data.table
滚动联接。
library(data.table)
#convert df into data.table and Timestamp into POSIX format
setDT(df)[, Timestamp := as.POSIXct(Timestamp, format="%m/%d/%Y %H:%M:%S")]
#create the intervals of 30seconds according to needs
tstmp <- seq(as.POSIXct("2018-06-26 00:00:30", tz=""),
as.POSIXct("2018-06-26 00:02:30", tz=""),
by="30 sec")
#rolling join between intervals and df
df[.(Timestamp=tstmp), on=.(Timestamp), roll=Inf]
输出:
Timestamp Value
1: 2018-06-26 00:00:30 10
2: 2018-06-26 00:01:00 10
3: 2018-06-26 00:01:30 15
4: 2018-06-26 00:02:00 15
5: 2018-06-26 00:02:30 20
有关更多信息,请阅读roll
中的?data.table
自变量