Levenshtein距离算法的冗余性

时间:2014-07-14 19:01:21

标签: algorithm levenshtein-distance edit-distance

在典型的动态Levenshtein距离算法中,要计算单元格d[i][j]的值,其中ij分别是行数和列数,我们采用最小值{ {1}},d[i-1][j-1]+0/1d[i-1][j]+1。但是,在我看来,d[i][j-1]+1d[i-1][j-1]+0/1的最小值始终为d[i-1][j]+1,在这种情况下,计算中包含d[i-1][j-1]+0/1似乎是多余的。 d[i-1][j]+1>是否一直如此? Levenshtein距离算法中的d[i-1][j-1]+0/1,如果不是,省略这种比较会不会更有效率?

编辑:对于未充分研究的问题,我们深表歉意;算法的任何标准运行都显示d[i-1][j]+1>的实例。 d[i-1][j-1]+0/1

d[i-1][j]+1

(考虑第二行)。

1 个答案:

答案 0 :(得分:1)

参考Wikipedia Article,最后一种情况下的最小值必须在"删除"情况下。

假设我们想要计算abcab之间的Levenshtein距离(从现在起固定并从符号中省略)。

迭代评估产生以下中间结果。

lev(0,0) = 0 (1st case applies)
lev(0,1) = 1 (1st case applies)
lev(0,2) = 2 (1st case applies)

lev(1,0) = 1 (1st case applies)
lev(1,1) = min(2,2,0) (2nd case, minimum taken in last term) = 0
lev(1,2) = min(1,2,1) (2nd case, minumum taken in last term) = 1

lev(2,0) = 2 (1st case applies)
lev(2,1) = min(3,1,2) (2nd case, minimum taken in second term) = 1 (*)
lev(2,2) = min(2,2,0) (2nd case, minimum taken in the last term) = 0

lev(3,0) = 3 (1st case applies)
lev(3,1) = min(4,2,2) (2nd case, minimum taken in the second and third term) = 2
lev(3,2) = min(3,1,2) (2nd case, minimum taken in the second term) = 1 (*)

标有(*)的行是第二种情况发生的情况,但最小值是。还可以找到显示动态编程表的在线计算器here