我正在尝试建立一个像这样的正则表达式:
[match-word] ... [exclude-specific-word] ... [match-word]
这似乎与否定前瞻有关,但是当我遇到这样的情况时,我遇到了问题:
[match-word] ... [exclude-specific-word] ... [match-word] ... [excluded word appears again]
我希望上面的句子匹配,但是第一个匹配词和第二个匹配词之间的负面预测“溢出”,所以第二个词永远不会匹配。
让我们看一个实际的例子。
我不想匹配每个带有“我”和“馅饼”这个词的句子,而不是那两个词之间的“讨厌”这个词。 我有这三句话:
i sure like eating pie, but i love donuts <- Want to match this
i sure like eating pie, but i hate donuts <- Want to match this
i sure hate eating pie, but i like donuts <- Don't want to match this
我有这个正则表达式:
^i(?!.*hate).*pie - have removed the word boundaries for clarity, original is: ^i\b(?!.*\bhate\b).*\bpie\b
哪个匹配第一个句子,但不匹配第二个句子,因为负向前瞻扫描整个字符串。
有没有办法限制负面前瞻,如果在遇到“讨厌”之前遇到“馅饼”就会感到满意?
注意:在我的实现中,此正则表达式之后可能还有其他术语(它是从语法搜索引擎动态构建的),例如:
^i(?!.*hate).*pie.*donuts
我目前正在使用JRegex,但如果需要,可能会切换到JDK Regex
更新:我在最初的问题中忘了提及:
句子中可能存在“否定结构”,如果有可能,即使“否定”结构存在进一步存在,我也希望匹配句子。
澄清一下,看看这些句子:
i sure like eating pie, but i love donuts <- Want to match this
i sure like eating pie, but i hate donuts <- Want to match this
i sure hate eating pie, but i like donuts <- Don't want to match this
i sure like eating pie, but i like donuts and i hate making pie <- Do want to match this
rob的答案完全适用于这个额外的约束,所以我接受了那个。
答案 0 :(得分:2)
在C
...A...B...
在python
中进行测试:
$ python
>>> import re
>>> re.match(r'.*A(?!.*C.*B).*B', 'C A x B C')
<_sre.SRE_Match object at 0x94ab7c8>
所以我得到了这个正则表达式:
.*\bi\b(?!.*hate.*pie).*pie
答案 1 :(得分:2)
这个正则表达式应该适合你
^(?!i.*hate.*pie)i.*pie.*donuts
<强>解释强>
"^" + // Assert position at the beginning of a line (at beginning of the string or after a line break character)
"(?!" + // Assert that it is impossible to match the regex below starting at this position (negative lookahead)
"i" + // Match the character “i” literally
"." + // Match any single character that is not a line break character
"*" + // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"hate" + // Match the characters “hate” literally
"." + // Match any single character that is not a line break character
"*" + // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"pie" + // Match the characters “pie” literally
")" +
"i" + // Match the character “i” literally
"." + // Match any single character that is not a line break character
"*" + // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"pie" + // Match the characters “pie” literally
"." + // Match any single character that is not a line break character
"*" + // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"donuts" // Match the characters “donuts” literally
答案 2 :(得分:2)
在您的开始和停止字词之间的每个字符处,您必须确保它与您的开头和停用字词不匹配。就像这样(我在其中包含了一些可读性的小空格):
^i ( (?!hate|pie) . )* pie
这是一个测试事物的python程序。
import re
test = [ ('i sure like eating pie, but i love donuts', True),
('i sure like eating pie, but i hate donuts', True),
('i sure hate eating pie, but i like donuts', False) ]
rx = re.compile(r"^i ((?!hate|pie).)* pie", re.X)
for t,v in test:
m = rx.match(t)
print t, "pass" if bool(m) == v else "fail"
答案 3 :(得分:0)
使用答案上方的代码
import re
test = [ ('i sure like eating pie, but i love donuts', True),
('i sure like eating pie, but i hate donuts', True),
('i sure hate eating pie, but i like donuts', False) ]
rx = re.compile(r"^i ((?!hate|pie).)* pie", re.X)
for t,v in test:
m = rx.match(t)
print t, "pass" if bool(m) == v else "fail"
我明白了
i sure like eating pie, but i love donuts pass
i sure like eating pie, but i hate donuts pass
i sure hate eating pie, but i like donuts pass