我在R
中有两个data.tables,如下所示:
蜱
ask bid createTime
1: 106.788 106.487 2018-03-01 00:00:01
2: 106.788 106.487 2018-03-01 00:00:01
3: 106.788 106.487 2018-03-01 00:00:02
4: 106.788 106.487 2018-03-01 00:00:02
5: 106.788 106.487 2018-03-01 00:00:03
. .
. .
992698: 105.730 105.431 2018-03-06 23:59:56
992699: 105.730 105.431 2018-03-06 23:59:56
992700: 105.732 105.431 2018-03-06 23:59:57
992701: 105.732 105.431 2018-03-06 23:59:57
992702: 105.732 105.431 2018-03-06 23:59:59
和酒吧:
volume from to
1.196550000 2018-03-01 00:00:00 2018-03-01 00:01:00
2.233350000 2018-03-01 00:01:00 2018-03-01 00:02:00
3.201950000 2018-03-01 00:02:00 2018-03-01 00:03:00
4.97700000 2018-03-01 00:03:00 2018-03-01 00:04:00
5.34200000 2018-03-01 00:04:00 2018-03-01 00:05:00
. .
. .
8068:53800000 2018-03-06 23:55:00 2018-03-06 23:56:00
所以,我希望Bars表中的每一行计算Ticks计数,其中creatime> = from和creatime<至。像这样:
volume from to TicksCount
1.196550000 2018-03-01 00:00:00 2018-03-01 00:01:00 187
2.233350000 2018-03-01 00:01:00 2018-03-01 00:02:00 72
3.201950000 2018-03-01 00:02:00 2018-03-01 00:03:00 56
4.97700000 2018-03-01 00:03:00 2018-03-01 00:04:00 58
5.34200000 2018-03-01 00:04:00 2018-03-01 00:05:00 52
我找到了怎么做的方法,但效果很慢。 我试着这样做:
Bars <- Bars[, TicksCount:= sapply(1:nrow(Bars), function(i) {
nrow(Tick[Bars$from[i] <= createTime & createTime < Bars$to[i]])
})]
也许谁知道如何让它更快? 求救!)
答案 0 :(得分:1)
data.table :: foverlaps()很快就能完成您的工作:
你的两张桌子:
ticks <-
data.table(
ask = runif(1e5, 0, 1e5),
bid = runif(1e5, 0, 1e5),
createTime = runif(1e5, 0, 1e3)
)
bars <-
data.table(
volume = runif(1e3, 0, 1e3),
from = seq(0, 1e3 - 1, 1),
to = seq(1, 1e3)
)
要使用foverlaps(),您需要有两个具有两个范围的表,而不仅仅是一个具有范围的表。因此,在ticks中添加一个辅助列以创建临时范围:
ticks[, helper := createTime]
然后,为每个条形组创建一个ID(假设没有重复项,条形图中没有重叠范围):
bars[, bar.id := .I]
每个表都必须有一个data.table键,其中key1是范围开始,key2是范围结束:
setkey(ticks, createTime, helper)
setkey(bars, from, to)
然后,在&#39;内运行一个&#39;数据集上的foverlaps,其中x是Ticks,y是Bars。这通过在重叠范围上连接x和y来创建新表(其中x范围落在y范围内)。下面的第二步聚合新表,按bar.id计算滴答,第三步将聚合表连接回Bars,将字段ticksCount添加到Bars。
foverlaps(ticks, bars, type = 'within')[,
.(ticksCount = .N), .(bar.id)
][bars, on = 'bar.id']
答案 1 :(得分:0)
以另一种方式sapply
:
f<-function(createTime,Bars)
{
return(sum(Bars$from <= createTime & createTime < Bars$to))
}
Bars$TickCount<-sapply(Ticks$createTime,f,Bars=Bars)
你的输出:
Bars
volume from to TickCount
1 1.19655 2018-03-01 00:00:00 2018-03-01 00:01:00 2
2 2.23335 2018-03-01 00:00:00 2018-03-01 00:02:00 2