我有一些数据
head(stockAtt)
DATE TIME EX SYM_ROOT SIZE
1: 2018-12-03 34201.549405 X T 1
2: 2018-12-03 34201.549405 P T 28
3: 2018-12-03 34301.549405 P T 28
4: 2018-12-03 35401.549405 T T 36
5: 2018-12-03 35501.549405 T T 36
6: 2018-12-03 36601.549405 T T 36
7: 2018-12-03 36101.549405 Z T 3
8: 2018-12-03 36801.549405 Z T 23
9: 2018-12-03 37001.549405 Z T 16
10: 2018-12-03 39001.549405 X T 5
我有一个以秒为单位的时间序列,可以将其视为垃圾箱。
seq(from = 34200, to = 40000, by = 1000 )
[1] 34200 35200 36200 37200 38200 39200
我想按如下所示的基于“时间”的间隔将data.table拆分。
DATE TIME EX SYM_ROOT SIZE
1: 2018-12-03 34201.549405 X T 1
2: 2018-12-03 34201.549405 P T 28
3: 2018-12-03 34301.549405 P T 28
DATE TIME EX SYM_ROOT SIZE
1: 2018-12-03 35401.549405 T T 36
2: 2018-12-03 35501.549405 T T 36
DATE TIME EX SYM_ROOT SIZE
1: 2018-12-03 36601.549405 T T 36
2: 2018-12-03 36101.549405 Z T 3
3: 2018-12-03 36801.549405 Z T 23
DATE TIME EX SYM_ROOT SIZE
1: 2018-12-03 37001.549405 Z T 16
DATE TIME EX SYM_ROOT SIZE
1: 2018-12-03 39001.549405 X T 5
答案 0 :(得分:1)
以下是一些选择:
1)使用data.table::split
split(DT, DT[, cut(TIME, seq(34200, 40000, 1000))])
2)在cut
内使用by
DT[, .(.(as.data.table(c(.(TIME=TIME), .SD)))), by=cut(TIME, seq(34200, 40000, 1000))]$V1
或
DT[, tm := TIME][, .(.(.SD)), by=cut(tm, seq(34200, 40000, 1000))]$V1
3)jangorecki在评论中建议的另一种方法:
data.table:::split.data.table(DT[, cut_col := cut(TIME, seq(34200, 40000, 1000))], by="cut_col")
主力确实是cut
。来自cut
的帮助:
cut将x的范围划分为间隔,并根据x的值落入的间隔对值进行编码。
一些时间:
set.seed(0L)
nr <- 1e7
DT <- data.table(TIME=rnorm(nr, 37100))
DT2 <- copy(DT)
DT3 <- copy(DT)
DT4 <- copy(DT)
microbenchmark::microbenchmark(
split_f=data.table:::split.data.table(DT, f=DT[, cut(TIME, seq(34200, 40000, 1000))]),
split_by=data.table:::split.data.table(DT2[, cut_col := cut(TIME, seq(34200, 40000, 1000))], by="cut_col"),
by1=DT3[, tm := TIME][, .(.(.SD)), by=cut(tm, seq(34200, 40000, 1000))]$V1,
by2=DT4[, .(.(as.data.table(c(.(TIME=TIME), .SD)))), by=cut(TIME, seq(34200, 40000, 1000))]$V1,
times=3L
)
时间:
Unit: milliseconds
expr min lq mean median uq max neval cld
split_f 691.6382 716.6919 748.6798 741.7457 777.2006 812.6554 3 a
split_by 840.0505 910.3817 938.2106 980.7129 987.2906 993.8683 3 a
by1 738.8859 749.1444 797.0015 759.4029 826.0593 892.7157 3 a
by2 623.7743 667.5200 720.1821 711.2658 768.3860 825.5063 3 a