Question

基于Efficiency of the search algorithm of KMP，我真的不明白为什么循环最多可以执行2n次。

以下是wiki上的伪代码

algorithm kmp_search:
    input:
        an array of characters, S (the text to be searched)
        an array of characters, W (the word sought)
    output:
        an integer (the zero-based position in S at which W is found)

    define variables:
        an integer, m ← 0 (the beginning of the current match in S)
        an integer, i ← 0 (the position of the current character in W)
        an array of integers, T (the table, computed elsewhere)

    while m + i < length(S) do
        if W[i] = S[m + i] then
            if i = length(W) - 1 then
                return m
            let i ← i + 1
        else
            let m ← m + i - T[i]
            if T[i] > -1 then
                let i ← T[i]
            else
                let i ← 0

    (if we reach here, we have searched all of S unsuccessfully)
    return the length of S

我认为while循环最多执行n次，而不是2n次。循环中有两个分支。第一个分支增加但不增加m。第二个分支将i-T [i]加到m并且i> T [i]，因此m将增加。因此m + i总是在while循环中增加。我认为循环中的总时间最多为n，为什么是2n次？

Answer 1

以下段落错误：

第二个分支将i-T [i]添加到m并且i> T [i]，因此m将增加。因此m + i总是在while循环中增加。

请注意，在两者之间，i会减少到T[i]。因此，在这些情况下，m+i保持不变。

以下是一个例子：

S: aaaab
W: aaaaa
T: 01234

值m和m+i迭代为：

基本上，以下观察可能有助于更好地理解算法。值m是我们查看的子字符串的开头，m+i是它的结尾。更准确地说，此子字符串S[m..m+i)始终是W的前缀。在每一步中，我们将该子字符串的开头或结尾向右移动至少一个。

事实上，如果我们只关心完整匹配（例如，在W中搜索S的最长前缀），我们确实可以减少迭代次数n + 1其中n = length(S)使用以下修改后的循环条件：

while m + length(W) <= length(S) do
    ... (the same things)

Answer 2

可以很容易地将算法可视化如下。你有一个固定的字符串S，一个直接在S下面的可移动的字符串W，以及一个覆盖W和S两个字符的小矩形滑动窗口。最初W的开头是在S的开头，窗口覆盖的初始字符是都。算法的每一步如下：

如果滑动窗口覆盖的两个字符匹配，则将窗口向右移动一个位置。（如果我们在W的最后一个角色，搜索结束了，我们就取得了成功。）
否则，如果窗口覆盖W的第一个字符，请将W和窗口向右移动一个位置。
否则，按表中计算的数量向右移动S（这个分析的实际数量并不重要;但这是一个正数）。

我建议用两条纸和一张纸窗或一块玻璃做一下！

很容易看出，对于每个步骤1，您最多可以执行一次步骤3。最糟糕的情况是，对于每个步骤1，您只执行一次步骤3，并且根本不执行步骤2。如果你的第一个字符W总是匹配S，而第二个字符永远不匹配，那将是最糟糕的情况。因此，W="ab", S="aaaaaaaaaaaaa.."。

kmp搜索算法的效率

2 个答案: