如何处理dataframe datetime列中不稳定的秒以获取详细数据?

时间:2019-06-19 08:28:43

标签: r dataframe data.table lubridate

我必须从不稳定的原始数据中精确地提取数据。 我无法使用second()软件包中的data.table从不稳定的秒开始进行微调,将前半分钟向下取整,将后半部分向上取整。这不是处理越来越多原始数据的好方法,效果也不佳。

这是我的示例数据框:

library(data.table)
df <- read.table(text="
             datetime   ,val
2019-06-19 08:25:55.470,1706506
2019-06-19 08:24:55.560,1706504
2019-06-19 08:24:07.087,1706502
2019-06-19 08:22:55.510,1706500
2019-06-19 08:22:00.080,1706497
2019-06-19 08:21:44.977,1706495
2019-06-19 08:19:55.533,1706493
2019-06-19 08:18:55.470,1706491
2019-06-19 08:18:17.610,1706488
2019-06-19 08:16:55.567,1706486
2019-06-19 08:15:55.440,1706484
2019-06-19 08:14:55.543,1706481
2019-06-19 08:13:55.427,1706479
2019-06-19 08:13:06.477,1706477
2019-06-19 08:12:21.043,1706475
2019-06-19 08:10:55.420,1706473
2019-06-19 08:09:55.447,1706471
2019-06-19 08:08:55.477,1706469
2019-06-19 08:07:55.443,1706467
2019-06-19 08:06:55.550,1706465",sep=",",header=TRUE,stringsAsFactors=FALSE)
df$datetime <- as.POSIXct(df$datetime)

每分钟直冲一下,结果很糟糕:

> minute(df$datetime[second(df$datetime) > 30]) = minute(df$datetime[second(df$datetime) > 30]) + 1
> second(df$datetime) <- 0
> df
              datetime     val
1  2019-06-19 08:26:00 1706506
2  2019-06-19 08:25:00 1706504
3  2019-06-19 08:24:00 1706502
4  2019-06-19 08:23:00 1706500
5  2019-06-19 08:22:00 1706497
6  2019-06-19 08:22:00 1706495
7  2019-06-19 08:20:00 1706493
8  2019-06-19 08:19:00 1706491
9  2019-06-19 08:18:00 1706488
10 2019-06-19 08:17:00 1706486
11 2019-06-19 08:16:00 1706484
12 2019-06-19 08:15:00 1706481
13 2019-06-19 08:14:00 1706479
14 2019-06-19 08:13:00 1706477
15 2019-06-19 08:12:00 1706475
16 2019-06-19 08:11:00 1706473
17 2019-06-19 08:10:00 1706471
18 2019-06-19 08:09:00 1706469
19 2019-06-19 08:08:00 1706467
20 2019-06-19 08:07:00 1706465

08:20:00-08:22:00失败

任何帮助将不胜感激!

已编辑:这里有更多原始数据CSV链接here

1 个答案:

答案 0 :(得分:2)

这就是我想你所追求的。

即使我建议您检查源数据。

library(data.table)

DT <- fread(text="
             datetime   ,val
2019-06-19 08:25:55.470,1706506
2019-06-19 08:24:55.560,1706504
2019-06-19 08:24:07.087,1706502
2019-06-19 08:22:55.510,1706500
2019-06-19 08:22:00.080,1706497
2019-06-19 08:21:44.977,1706495
2019-06-19 08:19:55.533,1706493
2019-06-19 08:18:55.470,1706491
2019-06-19 08:18:17.610,1706488
2019-06-19 08:16:55.567,1706486
2019-06-19 08:15:55.440,1706484
2019-06-19 08:14:55.543,1706481
2019-06-19 08:13:55.427,1706479
2019-06-19 08:13:06.477,1706477
2019-06-19 08:12:21.043,1706475
2019-06-19 08:10:55.420,1706473
2019-06-19 08:09:55.447,1706471
2019-06-19 08:08:55.477,1706469
2019-06-19 08:07:55.443,1706467
2019-06-19 08:06:55.550,1706465", sep=",", header=TRUE, stringsAsFactors = FALSE)

DT[, datetime := as.POSIXct(as.character(round(as.POSIXct(datetime), "mins")))]
DT[, diff := c(-60, diff(datetime))]
DT[diff == 0, datetime := datetime-60][, diff := NULL]
print(DT)

结果:

               datetime     val
 1: 2019-06-19 08:26:00 1706506
 2: 2019-06-19 08:25:00 1706504
 3: 2019-06-19 08:24:00 1706502
 4: 2019-06-19 08:23:00 1706500
 5: 2019-06-19 08:22:00 1706497
 6: 2019-06-19 08:21:00 1706495
 7: 2019-06-19 08:20:00 1706493
 8: 2019-06-19 08:19:00 1706491
 9: 2019-06-19 08:18:00 1706488
10: 2019-06-19 08:17:00 1706486
11: 2019-06-19 08:16:00 1706484
12: 2019-06-19 08:15:00 1706481
13: 2019-06-19 08:14:00 1706479
14: 2019-06-19 08:13:00 1706477
15: 2019-06-19 08:12:00 1706475
16: 2019-06-19 08:11:00 1706473
17: 2019-06-19 08:10:00 1706471
18: 2019-06-19 08:09:00 1706469
19: 2019-06-19 08:08:00 1706467
20: 2019-06-19 08:07:00 1706465