Question

我有一句话，我想用一个数字替换一个字符串的一部分。如果我们有完全匹配，gsub函数可以正常工作。

gsub('great thing', 5555 ,c('hey this is a great thing'))
gsub('good rabbit', 5555 ,c('hey this is a good rabbit in the field'))

但现在我有以下问题。如果字符串部分出错，如何将模糊匹配函数应用于字符串？

gsub('great thing', 5555 ,c('hey this is a graet thing'))
gsub('good rabbit', 5555 ,c('hey this is a goood rabit in the field'))

算法应该弄清楚＆＃34;伟大的事情＆＃34;和＃34; graet thing＆＃34;或者＆＃34;好兔子＆＃34;和＆＃34; goood rabit＆＃34;非常类似，应该用数字5555代替。如果我们可以使用Jaro Winkler距离找到字符串中的近似匹配然后替换近似子字符串，那么最好。我需要一个能够做到这一点的非常抽象的算法。

有什么想法吗？

Answer 1

一些agrep示例：

agrep("lasy", "1 lazy 2")
agrep("lasy", "1 lazy 2", max = list(sub = 0))
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2)
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, value = TRUE)
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, ignore.case = TRUE)

agrep是基础。如果您加载stringdist，您可以使用Jarro-Winkler计算字符串距离（您猜对了）stringdist，或者如果您懒惰，可以使用ain或{{1} }。就我的目的而言，我更倾向于使用Damerau-Levenshtein（amatch），但你的里程可能会有所不同。

请确保在使用之前准确了解算法的参数是如何工作的（即将p，q和maxDist值设置为根据您正在做的事情而有意义的水平）

文本R中的近似匹配和替换

1 个答案: