我检查了KMP table-building algorithm from Wikipedia,但我不理解第二种情况循环背后的逻辑
(second case: it doesn't, but we can fall back)
else if cnd > 0 then
let cnd ← T[cnd]
我尝试使用此算法构建一个表,它运行得很好。我知道cnd ← T[cnd]
有助于找到合适的后缀长度。我不明白的是"怎么"它做到了吗?
用例子解释会很好。
谢谢!
编辑:我刚发现我的问题与此处的问题重复:"Partial match" table (aka "failure function") in KMP (on wikipedia)
我想我现在得到了答案。不过,还有一个解释会有所帮助。谢谢!
答案 0 :(得分:3)
假设您有一个字符串Hello World!!!
,并且您想要搜索Head Up
。
Hello World!!!
Head Up
^
当您处于第一个和第二个字符时,第一个条件应用(first case: the substring continues)
,对于标记位置,字符不匹配但您已经在子字符串匹配内(2个字符)匹配到那里),这种情况对应于第二个条件(second case: it doesn't, but we can fall back)
。第三种情况是你错过匹配模式的第一个字符。
第二个条件是必要的,因为您可以使用匹配字符的信息直到未命中匹配,以避免您已经知道结果的不必要的比较(跳过您已经知道的string
的字符开始部分模式不匹配)。
示例:使用字符串HeHello World!!!
并搜索Hello
HeHello World!!!
Hello
^ when you miss match this character using the table of KMP you known that
could skip 2 characters because
HeHello World!!!
Hello
^ this would miss match
在为模式HeHello
构建模式表的情况下。假设^
为cnd
且*
为pos
。起点为pos = 2
和cnd = 0
(但在使用pos - 1 = 1
检查模式时)。
HeHeHello T [-1,0,0,0,0,0,0,0,0]
^* comparing 0 with 1 go to condition 3 cnd = 0, pos = 2
_
HeHeHello T [-1,0,0,1,0,0,0,0,0]
^ * comparing 0 with 2 go to condition 1 cnd = 0, pos = 3
_
HeHeHello T [-1,0,0,1,2,0,0,0,0]
^ * comparing 1 with 3 go to condition 1 cnd = 1, pos = 4
_
HeHeHello T [-1,0,0,1,2,3,0,0,0]
^ * comparing 2 with 4 go to condition 1 cnd = 2, pos = 5
_
HeHeHello T [-1,0,0,1,2,3,4,0,0]
^ * comparing 3 with 5 go to condition 1 cnd = 3, pos = 6
HeHeHello T [-1,0,0,1,2,3,4,0,0]
^ * comparing 4 with 6 go to condition 2 (cnd = T[cnd], cnd = T[4] = 2)
HeHeHello T [-1,0,0,1,2,3,4,0,0]
^ * comparing 2 with 6 go to condition 2 (cnd = T[cnd], cnd = T[2] = 0)
...