我想计算一个文档中句子之间的Levenshtein距离。我找到了一个代码来计算字符级别的距离,但是我希望它处于单词级别。 例如,此字符级别的输出为6,但我希望它为1,这意味着如果我们想将b更改为a或a更改为b,则只需要删除一个单词:
a = "The patient tolerated this ."
b = "The patient tolerated ."
def levenshtein_distance(a, b):
if a == b:
return 0
if len(a) < len(b):
a, b = b, a
if not a:
return len(b)
previous_row = range(len(b) + 1)
for i, column1 in enumerate(a):
current_row = [i + 1]
for j, column2 in enumerate(b):
insertions = previous_row[j + 1] + 1
deletions = current_row[j] + 1
substitutions = previous_row[j] + (column1 != column2)
current_row.append(min(insertions, deletions, substitutions))
previous_row = current_row
print (previous_row[-1])
return previous_row[-1]
result = levenshtein_distance(a, b)
答案 0 :(得分:0)
我建议避免重新发明轮子,可以使用pylev https://pypi.org/project/pylev/
您只需在控制台中执行pip install pylev
命令即可安装它。
然后使用单词而不是字母来计算距离:
import pylev
a = "The patient tolerated this ."
b = "The patient tolerated ."
a = a.split(" ")
b = b.split(" ")
print(pylev.levenshtein(a,b))
请记住,此解决方案区分大小写,并假定所有单词都是空格。