KMP中的“部分匹配”表(又名“失败函数”)(在维基百科上)

时间:2013-09-20 08:03:57

标签: algorithm wikipedia string-matching knuth-morris-pratt

我正在阅读维基百科上的KMP algorithm。 “表格构建算法的伪代码说明”部分中有一行代码让我感到困惑:let cnd ← T[cnd]

它有一个评论:(second case: it doesn't, but we can fall back),我知道我们可以退回,但为什么T [cnd],有原因吗?因为它让我很困惑。

这是表格构建算法的完整伪代码:

algorithm kmp_table:
    input:
        an array of characters, W (the word to be analyzed)
        an array of integers, T (the table to be filled)
    output:
        nothing (but during operation, it populates the table)

    define variables:
        an integer, pos ← 2 (the current position we are computing in T)
        an integer, cnd ← 0 (the zero-based index in W of the next 
character of the current candidate substring)

    (the first few values are fixed but different from what the algorithm 
might suggest)
    let T[0] ← -1, T[1] ← 0

    while pos < length(W) do
        (first case: the substring continues)
        if W[pos - 1] = W[cnd] then
            let cnd ← cnd + 1, T[pos] ← cnd, pos ← pos + 1

        (second case: it doesn't, but we can fall back)
        else if cnd > 0 then
            let cnd ← T[cnd]

        (third case: we have run out of candidates.  Note cnd = 0)
        else
            let T[pos] ← 0, pos ← pos + 1

1 个答案:

答案 0 :(得分:1)

您可以回退到T[cnd],因为它包含模式 W 的前一个最长正确前缀的长度,这也是W[0...cnd]的正确后缀。因此,如果W[pos-1]处的当前字符与W[T[cnd]]处的字符匹配,则可以延长W[0...pos-1]的最长正确前缀的长度(这是第一种情况)。

我想它有点像动态编程,你依赖于先前计算的值。

This 说明可能会对您有所帮助。