我想在一个值位于重叠范围之间的设置中,使用>df1
AthleteID Distance
Athlete1 5
Athlete2 10
Athlete3 25
>df2
CheckpointID Start End Score
Checkpoint1 1 8 2
Checkpoint2 7 12 4
Checkpoint3 9 15 6
Checkpoint4 16 26 8
Checkpoint5 20 30 10
从单独的列中选择最大数量的范围ID。虽然我对包的基本设置非常熟悉,但我找不到执行上述功能的方法。
这是一个小例子
>df1
AthleteID Distance Score CheckpointID
Athlete1 5 2 Checkpoint1
Athlete2 10 6 Checkpoint3
Athlete3 25 10 Checkpoint5
根据以上内容,最终的data.frame应如下所示
>df2
CheckpointID AthleteID Start End Score
Checkpoint1 Athlete1 1 8 2
Checkpoint2 Athlete1 7 12 4
Checkpoint3 Athlete1 9 15 6
Checkpoint4 Athlete1 16 26 8
Checkpoint5 Athlete1 20 30 10
Checkpoint1 Athlete2 1 8 3
Checkpoint2 Athlete2 7 12 5
Checkpoint3 Athlete2 9 15 7
Checkpoint4 Athlete2 16 26 9
Checkpoint5 Athlete2 20 30 11
Checkpoint1 Athlete3 1 8 1
Checkpoint2 Athlete3 7 12 3
Checkpoint3 Athlete3 9 15 5
Checkpoint4 Athlete3 16 26 7
Checkpoint5 Athlete3 20 30 11
=========================
修改
最后一个问题;我也有兴趣了解如何根据运动员ID使用不同的检查点分数(相同的间隔)。这是一个修改过的分数表
>df1
AthleteID Distance Score CheckpointID
Athlete1 5 2 Checkpoint1
Athlete2 10 7 Checkpoint3
Athlete3 25 11 Checkpoint5
所以最后的结果看起来像这样
{{1}}
答案 0 :(得分:6)
您也可以使用新实现的non-equi
联接来实现,这应该更直接......
y[x, on = .(Start <= Distance, End >= Distance), mult = "last",
.(AthleteID, Distance, Score, CheckpointID)]
其中,
x=fread("AthleteID Distance
Athlete1 5
Athlete2 10
Athlete3 25
")
y=fread("CheckpointID Start End Score
Checkpoint1 1 8 2
Checkpoint2 7 12 4
Checkpoint3 9 15 6
Checkpoint4 16 26 8
Checkpoint5 20 30 10
")
答案 1 :(得分:3)
您可以这样使用foverlaps
。关键是在Distance
中复制df1
列,以创建一个起始等于结束的人工间隔。然后,使用foverlaps
加入df1
和df2
,以查看[Distance, Distance2 (=Distance)]
落在[Start, End]
df2
内的行,并保持最后匹配。
library(data.table)
df1 <- fread("
AthleteID Distance
Athlete1 5
Athlete2 10
Athlete3 25
")
df2 <- fread("
CheckpointID Start End Score
Checkpoint1 1 8 2
Checkpoint2 7 12 4
Checkpoint3 9 15 6
Checkpoint4 16 26 8
Checkpoint5 20 30 10
")
# Need a duplicated temp column as end of interval
df1[, Distance2 := Distance]
#> AthleteID Distance Distance2
#> 1: Athlete1 5 5
#> 2: Athlete2 10 10
#> 3: Athlete3 25 25
# y must be keyed in foverlaps
setkey(df2, Start, End)
# use type within and mult last, then select column
foverlaps(df1, df2, by.x = c("Distance", "Distance2"), mult = "last", type = "within")[, .(AthleteID, Distance, Score, CheckpointID)]
#> AthleteID Distance Score CheckpointID
#> 1: Athlete1 5 2 Checkpoint1
#> 2: Athlete2 10 6 Checkpoint3
#> 3: Athlete3 25 10 Checkpoint5
# Delete temp column in df1
df1[, Distance2 := NULL]