具有非等向量和ID的容差的交集

时间:2017-01-21 01:14:42

标签: r vector subset intersection assign

我有一个关于匹配两个向量之间的值的问题。 假设我有一个矢量和数据框:

  data.frame
  value  name                       vector 2
154.0031  A                         154.0084
154.0768  B                         159.0344
154.2145  C                         154.0755
154.4954  D                         156.7758
156.7731  E
156.8399  F
159.0299  G
159.6555  H
159.9384  I

现在我想将向量2与数据框中的值进行比较,并使用可调整的定义全局容差(例如+ -0.005)并将相应的名称添加到向量2,因此我得到如下结果:

  data.frame
  value  name                       vector 2 name
154.0031  A                         154.0074  A
154.0768  B                         159.0334  G
154.2145  C                         154.0755  B
154.4954  D                         156.7758  E
156.7731  E
156.8399  F
159.0299  G
159.6555  H
159.9384  I

我尝试使用intersect(),但是没有宽容的选项吗?

非常感谢!

1 个答案:

答案 0 :(得分:2)

可以通过outerwhich和子集来实现此结果。

# calculate distances between elements of each object
# rows are df and columns are vec 2
myDists <- outer(df$value, vec2, FUN=function(x, y) abs(x - y))


# get the values that have less than some given value
# using arr.ind =TRUE returns a matrix with the row and column positions
matches <- which(myDists < 0.05, arr.ind=TRUE)

data.frame(name = df$name[matches[, 1]], value=vec2[matches[, 2]])
name    value
1    A 154.0084
2    G 159.0344
3    B 154.0755
4    E 156.7758

请注意,这只会返回带有匹配项的vec2元素,并返回满足阈值的所有df元素。

要使结果稳健,请使用

# get closest matches for each element of vec2
closest <- tapply(matches[,1], list(matches[,2]), min)

# fill in the names.
# NA will appear where there are no obs that meet the threshold.
data.frame(name = df$name[closest][match(as.integer(names(closest)),
                                         seq_along(vec2))], value=vec2)

目前,这会返回与上面相同的结果,但会返回在df中没有充分观察的NAs。

数据

如果您将来提出问题,请提供可重复的数据。见下文。

df <- read.table(header=TRUE, text="value  name
154.0031  A
154.0768  B
154.2145  C
154.4954  D
156.7731  E
156.8399  F
159.0299  G
159.6555  H
159.9384  I")

vec2 <- c(154.0084, 159.0344, 154.0755, 156.7758)