Question

根据我们的了解，给定正则表达式模式（例如A B A B A C），我们可以将其转换为DFA。在这个例子中，它就像一个链（你可以测试它here）。

这个“链状”DFA可以判断给定字符串是否与模式匹配（即接受/拒绝它）;但无法判断字符串中是否有任何出现并识别所有字符串。

示例： 假设这是搜索字符串：A B C A B A B A B A C A B C

虽然从第6个字符开始出现，但“链状”DFA无法说明这一点。它所能做的就是拒绝这个字符串。

问题：是否可以设计支持此类功能的正则表达式？

（注意：我理解这个问题有点令人困惑;我想澄清它让你感到困惑。）

Answer 1

The language of strings containing the substring ABABAC is matched by the regular expression:

.*ABABAC.*

Where the symbol . denotes a subexpression that matches any single input symbol (e.g. (A|B|C), if the input language only has the symbols A, B and C). To tell if a string has the substring ABABAC, you can build an NFA or a DFA from this regular expression, and check if it accepts your string.

Determining the location of the substring in the input string is not possible with a (single) standard N/DFA, simply because an N/DFA is defined to only return one bit of information (accept/reject). However, it is possible to implement an "augmented N/DFA" that, in addition to matching the input, also keeps track of where in the string each state transition last occurred; this information is enough to efficiently reconstruct the location of the substring.

如何设计用于搜索模式的正则表达式，而不是验证模式？

1 个答案: