Python Regex Findall无法正常工作

时间:2018-07-05 10:13:23

标签: python regex findall

我正在尝试使用正则表达式捕获Stanford CoreNLP依赖关系解析器的输出。我想捕获横跨几行(dependencies):Sentence之间的所有内容)的依赖项解析。数据示例:

Dependency Parse (enhanced plus plus dependencies):
root(ROOT-0, imply-5)
dobj(imply-5, what-1)
aux(imply-5, does-2)
det(man-4, the-3)
nsubj(imply-5, man-4)
advmod(mentions-8, when-6)
nsubj(mentions-8, he-7)
advcl(imply-5, mentions-8)
det(papers-10, the-9)
dobj(mentions-8, papers-10)
nsubj(written-13, he-11)
aux(written-13, has-12)
acl:relcl(papers-10, written-13)

Sentence #1 (10 tokens):

我正在使用的代码是:

regex = re.compile('dependencies\):(.*)Sentence', re.DOTALL)
found = regex.findall(text)

运行时,代码与整个文本文档匹配,而不仅仅是捕获组。当我在Regexr上试用时,效果很好。

非常感谢

1 个答案:

答案 0 :(得分:0)

使用re.findall(r"(?<=dependencies\):).*?(?=Sentence)", s, flags=re.DOTALL后向和前向

演示:

import re

s = """ Dependency Parse (enhanced plus plus dependencies):
root(ROOT-0, imply-5)
dobj(imply-5, what-1)
aux(imply-5, does-2)
det(man-4, the-3)
nsubj(imply-5, man-4)
advmod(mentions-8, when-6)
nsubj(mentions-8, he-7)
advcl(imply-5, mentions-8)
det(papers-10, the-9)
dobj(mentions-8, papers-10)
nsubj(written-13, he-11)
aux(written-13, has-12)
acl:relcl(papers-10, written-13)

Sentence #1 (10 tokens):"""

m = re.findall(r"(?<=dependencies\):).*?(?=Sentence)", s, flags=re.DOTALL)
print(m)

输出:

['\nroot(ROOT-0, imply-5)\ndobj(imply-5, what-1)\naux(imply-5, does-2)\ndet(man-4, the-3)\nnsubj(imply-5, man-4)\nadvmod(mentions-8, when-6)\nnsubj(mentions-8, he-7)\nadvcl(imply-5, mentions-8)\ndet(papers-10, the-9)\ndobj(mentions-8, papers-10)\nnsubj(written-13, he-11)\naux(written-13, has-12)\nacl:relcl(papers-10, written-13)\n\n']