Question

假设我有一个可能拼写错误的字符串的向量，如：

x <- c("Starr Wars","Lorde of the Ring", "The Habit")

我还有一个字典矢量，如下所示：

y <- c("Star Wars", "The Lord of the Rings", "The Hobbit")

amatch几乎做了我想要的事情：

amatch(x,y,maxDist=6)
[1] 1 2 3

它告诉我索引，x中的第一个字符串关闭到y中的第一个字符串。我正在寻找的是一个函数，它将返回实际最匹配的字符串的向量，而不是索引。换句话说，对于执行此操作的函数：

function(x,y,maxDist=n)
[1] "Star Wars" "The Lord of the Rings" "The Hobbit"

Answer 1

这感觉有点太容易，但在这里。您需要做的就是使用您的amatch代码“过滤”（即：子集）具有正确名称的向量。

require(stringdist)
x <- c("Starr Wars","Lorde of the Ring", "The Habit")
y <- c("Star Wars", "The Lord of the Rings", "The Hobbit")
y[amatch(x, y, maxDist = 6)]
# [1] "Star Wars"             "The Lord of the Rings" "The Hobbit"

R函数类似于amatch（）但替换

1 个答案: