Question

我在使用另一个正则表达式时遇到了一些麻烦。对于这个，我的代码应该寻找模式：

re.compile(r"kill(?:ed|ing|s)\D*(\d+).*?(?:men|women|children|people)?")

然而，它的匹配过于激进。它碰巧匹配一个带有“杀戮”一词的句子。但是这种模式继续收集，直到它在文本中进一步向下移动。特别是它匹配：

killed in an apparent u.s. drone attack on a car in yemen on sunday, tribal sources and local officials said.the men's car was driving through the south-eastern province of maareb, a mostly desert region where militants have taken refuge after being driven from southern strongholds.yemen, where al qaeda militants exploited a security vacuum during last year's uprising that ousted president ali abdullah saleh, has seen an in10

这不是我追求的行为。如果在单个句子中找不到它，我希望这种模式失败。

我在伪代码中尝试实现的解决方案是：

find instance of 'kill'
if what follows contains a period (\.) before a digit, do not match.

我失败的实现看起来像这样：

re.compile(r"kill(?:ed|ing|s)\D*(?!:\..*?)(\d+).*?(?:men|women|children|people)?")

我尝试了'后视'，但我必须指定一个宽度。我正在尝试用上面的内容匹配'kill'的任何结尾，然后是任何非数字，但不匹配一个句点，并且在我之后的数字之前可以自由地遵循任何其他内容。

可悲的是，这段代码在我的测试中表现完全相同。任何帮助将不胜感激。

Answer 1

一个小修改：

r"kill(?:ed|ing|s)[^\d.]*(\d+)[^.]*?(?:men|women|children|people)?"

基本上，我阻止完全停止.在kill和men / women / etc之间匹配。继之后。

只要没有字符，Python正则表达式匹配

1 个答案: