假设我有两张桌子。一个有约会,另一个有招待会。每张表都有孝顺ID,医生ID,开始和结束时间(约会计划和接待事实)以及其他一些数据。我想计算约会中有多少约会在预约期间内接受。接待事实可以在预约开始时间之前开始,之后,它可以在app内。间隔等。
下面我做了两张桌子。一个用于约会,一个用于接待。我写了嵌套循环,但它的工作速度非常慢。我的表包含大约50 mio行。我需要快速解决这个问题。我怎么能没有循环呢?提前谢谢!
header dir/Test1.h
header dir/Test2.h
header dir/Test3.h
src dir/Test1.cpp
src dir/Test2.cpp
src dir/Test3.cpp
#include <dir/Test1.h>
#include <dir/Test2.h>
#include <dir/Test3.h>
答案 0 :(得分:2)
使用foverlaps()
:
setkey(re, med.id, filial.id, start.time, end.time)
olaps = foverlaps(app, re, which=TRUE, nomatch=0L)[, .N, by=xid]
app[, count := 0L][olaps$xid, count := olaps$N]
app
# med.id filial.id start.time end.time A count
# 1: 1 100 2015-01-01 14:30:00 2015-01-01 15:29:59 0.60878560 1
# 2: 2 100 2015-01-01 15:30:00 2015-01-01 16:29:59 -0.11545284 0
# 3: 3 100 2015-01-01 16:30:00 2015-01-01 17:29:59 0.68992084 1
# 4: 4 100 2015-01-01 17:30:00 2015-01-01 18:29:59 0.04703938 1
# 5: 5 100 2015-01-01 18:30:00 2015-01-01 19:29:59 -0.95315419 0
# 6: 6 200 2015-01-01 14:30:00 2015-01-01 15:29:59 0.26193554 0
# 7: 7 200 2015-01-01 15:30:00 2015-01-01 16:29:59 1.55206077 1
# 8: 8 200 2015-01-01 16:30:00 2015-01-01 17:29:59 0.44517362 0
# 9: 9 200 2015-01-01 17:30:00 2015-01-01 18:29:59 0.11475881 0
# 10: 10 200 2015-01-01 18:30:00 2015-01-01 19:29:59 -0.66139828 0
PS:请完成vignettes并学会有效使用数据表。
答案 1 :(得分:1)
我实际上根本不认为您需要按时间重叠进行合并:您的代码实际上是按med.id
和filial.id
合并然后进行简单比较。
首先,为了清楚起见,我们重命名start.time
和end.time
字段:
setnames(app, c("start.time", "end.time"), c("app.start.time", "app.end.time"))
setnames(re, c("start.time", "end.time"), c("re.start.time", "re.end.time"))
然后您应该在键med.id
和filial.id
上合并两个data.tables,如下所示:
app_re <- re[app, on=c("med.id", "filial.id")]
# med.id filial.id re.start.time re.end.time B
# 1: 1 100 2015-01-01 14:25:00 2015-01-01 15:25:00 0.4307760
# 2: 2 100 <NA> <NA> NA
# 3: 3 100 2015-01-01 16:32:00 2015-01-01 17:36:00 -1.2933755
# 4: 4 100 2015-01-01 17:25:00 2015-01-01 18:40:00 -1.2374469
# 5: 5 100 <NA> <NA> NA
# 6: 6 200 2015-01-01 15:35:00 2015-01-01 15:49:00 -0.8054822
# 7: 7 200 2015-01-01 15:50:00 2015-01-01 16:12:00 2.5742241
# 8: 8 200 <NA> <NA> NA
# 9: 9 200 <NA> <NA> NA
# 10: 10 200 <NA> <NA> NA
# app.start.time app.end.time A
# 1: 2015-01-01 14:30:00 2015-01-01 15:29:59 -0.26828337
# 2: 2015-01-01 15:30:00 2015-01-01 16:29:59 0.24246341
# 3: 2015-01-01 16:30:00 2015-01-01 17:29:59 1.55824948
# 4: 2015-01-01 17:30:00 2015-01-01 18:29:59 1.25829302
# 5: 2015-01-01 18:30:00 2015-01-01 19:29:59 1.14244558
# 6: 2015-01-01 14:30:00 2015-01-01 15:29:59 -0.41234563
# 7: 2015-01-01 15:30:00 2015-01-01 16:29:59 0.07710022
# 8: 2015-01-01 16:30:00 2015-01-01 17:29:59 -1.46421985
# 9: 2015-01-01 17:30:00 2015-01-01 18:29:59 1.21682394
# 10: 2015-01-01 18:30:00 2015-01-01 19:29:59 1.11197318
然后,您可以使用与之前相同的条件创建计数变量:
app_re[, count :=
as.numeric(re.start.time < app.start.time & re.end.time > app.start.time) |
(re.start.time < app.end.time & re.start.time > app.start.time)]
# Convert the NAs to 0
app_re[, count := ifelse(is.na(count), 0, count)]
这应该比for
循环快得多。