只要没有字符,Python正则表达式匹配

时间:2012-11-06 01:56:47

标签: python regex negative-lookbehind

我在使用另一个正则表达式时遇到了一些麻烦。对于这个,我的代码应该寻找模式:

re.compile(r"kill(?:ed|ing|s)\D*(\d+).*?(?:men|women|children|people)?")

然而,它的匹配过于激进。它碰巧匹配一个带有“杀戮”一词的句子。但是这种模式继续收集,直到它在文本中进一步向下移动。特别是它匹配:

killed in an apparent u.s. drone attack on a car in yemen on sunday, tribal sources and local officials said.the men's car was driving through the south-eastern province of maareb, a mostly desert region where militants have taken refuge after being driven from southern strongholds.yemen, where al qaeda militants exploited a security vacuum during last year's uprising that ousted president ali abdullah saleh, has seen an in10

这不是我追求的行为。如果在单个句子中找不到它,我希望这种模式失败。

我在伪代码中尝试实现的解决方案是:

find instance of 'kill'
if what follows contains a period (\.) before a digit, do not match.

我失败的实现看起来像这样:

re.compile(r"kill(?:ed|ing|s)\D*(?!:\..*?)(\d+).*?(?:men|women|children|people)?")

我尝试了'后视',但我必须指定一个宽度。我正在尝试用上面的内容匹配'kill'的任何结尾,然后是任何非数字,但不匹配一个句点,并且在我之后的数字之前可以自由地遵循任何其他内容。

可悲的是,这段代码在我的测试中表现完全相同。任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:3)

一个小修改:

r"kill(?:ed|ing|s)[^\d.]*(\d+)[^.]*?(?:men|women|children|people)?"

基本上,我阻止完全停止.在kill和men / women / etc之间匹配。继之后。