如何在CLiPS模式中加入搜索模式

时间:2016-11-09 22:12:14

标签: python nlp information-extraction

我使用CLiPS pattern.search(Python 2.7)在文本中进行模式匹配。 我需要提取对应于'VBN NP'和'NP TO NP'的短语。 我可以单独完成,然后加入结果:

from pattern.en import parse,parsetree
from pattern.search import search

text="Published case-control studies have a lot of information about susceptibility to asthma."
sentenceTree = parsetree(text, relations=True, lemmata=True)
matches = []
for match in search("VBN NP",sentenceTree):
    matches.append(match.string)
for match in search("NP TO NP",sentenceTree):
    matches.append(match.string)
print matches
# Output: [u'Published case-control studies', u'susceptibility to asthma']

但我希望将id加入到一个搜索模式中。如果我试试这个,我根本就没有结果。

matches = []
for match in search("VBN NP|NP TO NP",sentenceTree):
    matches.append(match.string)
print matches
#Output: []

Official documentation没有提供任何线索。我也试过'{VBN NP} | {NP TO NP}''[VBN NP] | [NP TO NP]',但没有任何运气。

问题是: 是否可以在CLiPS pattern.search中加入搜索模式? 如果回答是“是”那么该如何做?

1 个答案:

答案 0 :(得分:0)

这种模式对我有用,{VBN NP} * + {NP TO NP},以及match()和group()方法

>>> from pattern.search import match
>>> from pattern.en import parsetree


>>> t = parsetree('Published case-control studies have a lot of information about susceptibility to asthma.',relations= True)

>>> m = match('{VBN NP} *+ {NP TO NP}',t)

>>> m.group(0) #matches the complete pattern 
[Word(u'Published/VBN'), Word(u'case-control/NN'), Word(u'studies/NNS'), Word(u'have/VBP'), Word(u'a/DT'), Word(u'lot/NN'), Word(u'of/IN'), Word(u'information/NN'), Word(u'about/IN'), Word(u'susceptibility/NN'), Word(u'to/TO'), Word(u'asthma/NN')]
>>> m.group(1) # matches the first group
[Word(u'Published/VBN'), Word(u'case-control/NN')]
>>> m.group(2) # matches the second group
[Word(u'susceptibility/NN'), Word(u'to/TO'), Word(u'asthma/NN')]

最后你可以将结果显示为

>>> matches=[]
>>> for i in range(2):
...     matches.append(m.group(i+1).string)
... 
>>> matches
[u'Published case-control', u'susceptibility to asthma']