我有以下data.table:
time id type price size api start.point end.point
1: 1399672906 37119594 ASK 440.002 1.4840000 TRUE 1399672606 1399672906
2: 1399672940 37119597 BID 441.000 0.1758830 TRUE 1399672640 1399672940
3: 1399672940 37119598 BID 441.000 0.0491166 TRUE 1399672640 1399672940
4: 1399673105 37119638 ASK 440.002 0.1313700 TRUE 1399672805 1399673105
5: 1399673198 37119668 BID 441.000 0.0233013 TRUE 1399672898 1399673198
6: 1399673198 37119669 BID 441.000 0.9744230 TRUE 1399672898 1399673198
7: 1399673208 37119675 BID 441.000 0.1587060 TRUE 1399672908 1399673208
8: 1399673208 37119676 BID 441.000 0.1238870 TRUE 1399672908 1399673208
9: 1399673208 37119677 BID 441.001 0.0100000 TRUE 1399672908 1399673208
10: 1399673208 37119678 BID 441.175 0.0129740 TRUE 1399672908 1399673208
11: 1399673208 37119679 BID 441.192 0.0100000 TRUE 1399672908 1399673208
12: 1399673208 37119680 BID 441.399 0.0129740 TRUE 1399672908 1399673208
13: 1399673208 37119681 BID 441.499 1.7500000 TRUE 1399672908 1399673208
14: 1399673208 37119682 BID 441.500 8.0214600 TRUE 1399672908 1399673208
15: 1399673241 37119691 BID 441.500 0.0453001 TRUE 1399672941 1399673241
16: 1399673274 37119696 ASK 440.030 0.9133460 TRUE 1399672974 1399673274
17: 1399673360 37119705 BID 440.030 0.0580000 TRUE 1399673060 1399673360
18: 1399673433 37119709 ASK 440.002 0.0319611 TRUE 1399673133 1399673433
19: 1399673506 37119711 ASK 440.002 0.2618460 TRUE 1399673206 1399673506
20: 1399673507 37119712 BID 440.002 1.0000000 TRUE 1399673207 1399673507
其中:
系列不是等距的。变量start.point和end.point实际上创建了以变量“time”结束的5分钟移动窗口。我想计算特定窗口中交易的频率。
我完成了for循环:
for (i in 1:nrow(trades)){
trades[i, freq := length(unique(trades[time >= start.point[i] & time <= end.point[i]]$id))]
setTxtProgressBar(status.bar, i)
}
但是,我想知道是否还有一些“时尚”的数据。 我试过像:
trades[, freq := list(length(unique(trades[time >= start.point & time <= end.point,]$id))), by = list(id)]
但结果是错误的,似乎它不适用于“每行一线”:
time id type price size api start.point end.point freq
1: 1399672906 37119594 ASK 440.002 1.4840000 TRUE 1399672606 1399672906 100
2: 1399672940 37119597 BID 441.000 0.1758830 TRUE 1399672640 1399672940 100
3: 1399672940 37119598 BID 441.000 0.0491166 TRUE 1399672640 1399672940 100
4: 1399673105 37119638 ASK 440.002 0.1313700 TRUE 1399672805 1399673105 100
5: 1399673198 37119668 BID 441.000 0.0233013 TRUE 1399672898 1399673198 100
6: 1399673198 37119669 BID 441.000 0.9744230 TRUE 1399672898 1399673198 100
7: 1399673208 37119675 BID 441.000 0.1587060 TRUE 1399672908 1399673208 100
8: 1399673208 37119676 BID 441.000 0.1238870 TRUE 1399672908 1399673208 100
9: 1399673208 37119677 BID 441.001 0.0100000 TRUE 1399672908 1399673208 100
10: 1399673208 37119678 BID 441.175 0.0129740 TRUE 1399672908 1399673208 100
11: 1399673208 37119679 BID 441.192 0.0100000 TRUE 1399672908 1399673208 100
更新
见下面的结构:
structure(list(time = c(1399672906L, 1399673105L, 1399673274L,
1399673433L, 1399673506L, 1399673531L), id = c(37119594L, 37119638L,
37119696L, 37119709L, 37119711L, 37119717L), type = c("ASK",
"ASK", "ASK", "ASK", "ASK", "ASK"), price = c(440.002, 440.002,
440.03, 440.002, 440.002, 440), size = c(1.484, 0.13137, 0.913346,
0.0319611, 0.261846, 3.168), api = c(TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE), start.point = c(1399672606, 1399672805, 1399672974,
1399673133, 1399673206, 1399673231), end.point = c(1399672906L,
1399673105L, 1399673274L, 1399673433L, 1399673506L, 1399673531L
), freq = c(1L, 4L, 13L, 14L, 13L, 11L)), .Names = c("time",
"id", "type", "price", "size", "api", "start.point", "end.point",
"freq"), sorted = c("type", "time"), class = c("data.table",
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x0000000002e50788>)
答案 0 :(得分:4)
我认为现在可以使用bioconductor package IRanges
包最好地完成此操作,直到data.table
中实现了间隔连接/范围连接。
require(IRanges)
ir1 = IRanges(trades$time, width=1L)
ir2 = IRanges(trades$start.point, trades$end.point)
olaps = findOverlaps(ir1, ir2, type = "within")
dt = data.table(queryHits(olaps), subjectHits(olaps))[, .N, by=V2]
trades[dt$V2, freq := dt$N]
# time id type price size api start.point end.point freq
# 1: 1399672906 37119594 ASK 440.002 1.4840000 TRUE 1399672606 1399672906 1
# 2: 1399673105 37119638 ASK 440.002 0.1313700 TRUE 1399672805 1399673105 2
# 3: 1399673274 37119696 ASK 440.030 0.9133460 TRUE 1399672974 1399673274 2
# 4: 1399673433 37119709 ASK 440.002 0.0319611 TRUE 1399673133 1399673433 2
# 5: 1399673506 37119711 ASK 440.002 0.2618460 TRUE 1399673206 1399673506 3
# 6: 1399673531 37119717 ASK 440.000 3.1680000 TRUE 1399673231 1399673531 4
HTH