Question

我有两个不同尺寸的字符向量a，b。我必须将a中的每个元素与b中的所有元素进行比较，并注意元素是否存在紧密匹配。为了匹配我使用agrepl函数。

以下是样本数据

a <- c("US","Canada","United States","United States of America")
b <- c("United States","U.S","United States","Canada", "America", "Spain")

以下是我用来匹配的代码。请帮助我如何避免for循环，因为我的真实数据分别有更多的900和5000条记录

for(i in 1:4)
{
    for(j in 1:6)
    {
      bFlag <- agrepl(a[i],b[j],  max.distance = 0.1,ignore.case = TRUE)

      if(bFlag)
      {
        #Custom logic
      }
      else 
      {
        #Custom logic
      }
    }
}

Answer 1

你不需要双循环，因为agrepl的第二个参数接受长度为＆gt; = 1的向量。所以你可以这样做：

lapply(a, function(x) agrepl(x, b, max.distance = 0.1, ignore.case = TRUE))
# [[1]]
# [1]  TRUE  TRUE  TRUE FALSE FALSE  TRUE
# 
# [[2]]
# [1] FALSE FALSE FALSE  TRUE FALSE FALSE
# 
# [[3]]
# [1]  TRUE FALSE  TRUE FALSE FALSE FALSE
# 
# [[4]]
# [1] FALSE FALSE FALSE FALSE FALSE FALSE

如果需要，您可以在lapply调用中添加一些自定义逻辑，但问题中未指定，因此我只将输出保留为logical的列表。

如果您想要索引（TRUE）而不是逻辑，则可以使用agrep代替agrepl：

lapply(a, function(x) agrep(x, b, max.distance = 0.1,ignore.case = TRUE))

# [[1]]
# [1] 1 2 3 6
# 
# [[2]]
# [1] 4
# 
# [[3]]
# [1] 1 3
# 
# [[4]]
# integer(0)

如果您只想要第一个TRUE索引，可以使用：

sapply(a, function(x) agrep(x, b, max.distance = 0.1,ignore.case = TRUE)[1])
#  US                   Canada            United States United States of America 
#   1                        4                        1                       NA

在R中循环通过2个不同维度的向量

1 个答案: