Question

这是我第一次向Stack Exchange发帖，我很抱歉，我确定会犯一些错误。我正在尝试评估数据集中的错误检测。

我有一个数据框，＆＃34; true＆＃34;检测

truth=
ID   Start   Stop    SNR
1   213466  213468  10.08
2   32238   32240   10.28
3   218934  218936  12.02
4   222774  222776  11.4
5   68137   68139   10.99

另一个带有时间列表的数据框，代表可能的＆＃39;真实＆＃39;检测

可能=     ID时间

1 32239.76

2 32241.14

3 68138.72

4 111233.93

5 128395.28

6 146180.31

7 188433.35

8 198714.7

我正在尝试查看我的“可能”中的值是否可以使用＆＃39;数据框位于起始值和停止值之间。如果是这样的话，我想创建一个名为＆＃34;在＆＃34;之间的第三列。以及＆＃34;真相＆＃34;中的一列数据框称为＆＃34;匹配。对于可能介于I和＃1之间的每个值，如果是1，否则为0.对于＆＃34;中的所有行，＆＃34;找到一个匹配我喜欢1，否则是0。

ID，而不是SNR都很重要。我不希望在ID上匹配。相反，我想完全运行数据框。输出应该类似于：

之间的ID时间

1 32239.76 0

2 32241.14 1

3 68138.72 0

4 111233.93 0

5 128395.28 0

6 146180.31 1

7 188433.35 0

8 198714.7 0

或者，知道我是否有可能＆＃39;时间值落在开始或结束时间的2秒内也可以做到这一点（也有1/0输出）

（感谢原帖的反馈）

在我浏览此系统时，先感谢您对我的耐心。

Answer 1

我会发布一个解决方案，我非常确定您的工作方式就是为了让您入门。也许其他人可以发布更有效的答案。

无论如何，首先我需要生成一些示例数据 - 下次请使用函数dput(head(truth, n = 25))和dput(head(possible, n = 25))在您帖子中的自己的数据集中提供此数据。我用过：

#generate random test data
set.seed(7)
truth <- data.frame(c(1:100),
                    c(sample(5:20, size = 100, replace = T)),
                    c(sample(21:50, size = 100, replace = T)))
possible <- data.frame(c(sample(1:15, size = 15, replace = F)))
colnames(possible) <- "Times"

获取样本数据后;以下解决方案提供了我认为您要求的内容。这应该直接扩展到您自己的数据集，因为它似乎是布局的。如果评论不清楚，请在下方回复。

#need the %between% operator
library(data.table)

#initialize vectors - 0 or false by default
truth.match <- c(rep(0, times = nrow(truth)))
possible.between <- c(rep(0, times = nrow(possible)))

#iterate through 'possible' dataframe
for (i in 1:nrow(possible)){
    #get boolean vector to show if any of the 'truth' rows are a 'match'
    match.vec <- apply(truth[, 2:3],
                       MARGIN = 1,
                       FUN = function(x) {possible$Times[i] %between% x})
    #if any are true then update the match and between vectors
    if(any(match.vec)){
        truth.match[match.vec] <- 1 
        possible.between[i] <- 1
    }
}

#i think this should be called anyMatch for clarity
truth$anyMatch <- truth.match

#similarly; betweenAny
possible$betweenAny <- possible.between

Answer 2

我认为这可以概念化为data.table中的滚动连接。以这个简化的例子为例：

truth
#   id start stop
#1:  1     1    5
#2:  2     7   10
#3:  3    12   15
#4:  4    17   20
#5:  5    22   26

possible
#   id times
#1:  1     3
#2:  2    11
#3:  3    13
#4:  4    28

setDT(truth)
setDT(possible)
melt(truth, measure.vars=c("start","stop"), value.name="times")[
    possible, on="times", roll=TRUE
    ][, .(id=i.id, truthid=id, times, status=factor(variable, labels=c("in","out")))]

#   id truthid times status
#1:  1       1     3     in
#2:  2       2    11    out
#3:  3       3    13     in
#4:  4       5    28    out

源数据集是：

truth <- read.table(text="id start stop
1 1 5
2 7 10
3 12 15
4 17 20
5 22 26", header=TRUE)

possible <- read.table(text="id times
1 3
2 11
3 13
4 28", header=TRUE)

在两个单独的向量

2 个答案: