如何设计用于搜索模式的正则表达式,而不是验证模式?

时间:2015-06-25 17:38:47

标签: regex search pattern-matching automata dfa

根据我们的了解,给定正则表达式模式(例如A B A B A C),我们可以将其转换为DFA。在这个例子中,它就像一个链(你可以测试它here)。

这个“链状”DFA可以判断给定字符串是否与模式匹配(即接受/拒绝它);但无法判断字符串中是否有任何出现并识别所有字符串

示例: 假设这是搜索字符串:A B C A B A B A B A C A B C

虽然从第6个字符开始出现,但“链状”DFA无法说明这一点。它所能做的就是拒绝这个字符串。

问题:是否可以设计支持此类功能的正则表达式?

(注意:我理解这个问题有点令人困惑;我想澄清它让你感到困惑。)

1 个答案:

答案 0 :(得分:0)

The language of strings containing the substring ABABAC is matched by the regular expression:

.*ABABAC.*

Where the symbol . denotes a subexpression that matches any single input symbol (e.g. (A|B|C), if the input language only has the symbols A, B and C). To tell if a string has the substring ABABAC, you can build an NFA or a DFA from this regular expression, and check if it accepts your string.

Determining the location of the substring in the input string is not possible with a (single) standard N/DFA, simply because an N/DFA is defined to only return one bit of information (accept/reject). However, it is possible to implement an "augmented N/DFA" that, in addition to matching the input, also keeps track of where in the string each state transition last occurred; this information is enough to efficiently reconstruct the location of the substring.