regex - Difference Between KMP and Regex/DFA-based Searching

Difference Between KMP and Regex/DFA-based Searching

时间：2015-06-25 18:41:59

标签： regex search pattern-matching dfa knuth-morris-pratt

I am confused about the relation between KMP (Knuth–Morris–Pratt) and Regex (DFA-based) Searching. My thought is that KMP cannot use regex notations (e.g., (A|B){2}C), so it can only search for a "single" string (e.g., AC or BC, but not AC|BC). Is this true? Another question, if the pattern is a single string (ABABAC), are they essentially using the same?

3 个答案:

答案 0 :(得分：0)

事实上，存在一种广义形式的KMP，即FA（aho-corasick算法）。它也很容易使用通配符。 IMO你可以使用kmp的正则表达式，但这并不容易。

答案 1 :(得分：0)

似乎（95％肯定）两种算法都应该完全相同，因为从字符串中的位置i移动到位置p处的前缀末尾的步骤将与非确定性自动机相同在两个状态中，正好位于前缀p之后的状态，以及位于位置i的字符串中的那个状态。一旦转换为dfa，这个自动机将有一个模拟NFA的状态，它将以线性时间结束。所以kleene星的正则表达式相当于KMP。

答案 2 :(得分：-1)

KMP不能使用正则表达式，因此它只能搜索＆＃34;单个＆＃34;串。这是真的吗？

是。 KMP是string search algorithm，而不是模式匹配算法。

另一个问题，如果模式是单个字符串（ABABAC），它们基本上是使用相同的吗？

不，基于DFA的匹配不等同于KMP算法。但是，高级正则表达式匹配实现可能会使用KMP作为优化。