如何从Stanford NLTK解析树中提取动词短语

时间:2019-03-28 18:16:32

标签: python-3.x stanford-nlp

我正在使用斯坦福解析器生成这样的句子的解析树:

(ROOT
  (SBARQ
    (WHADVP (WRB how))
    (SQ
      (VBP do)
      (NP (FW i))
      (VP
        (VB add)
        (NP (DT a) (NN link))
        (PP (TO to) (NP (PRP$ my) (NNS links)))))))

我想提取最终的动词短语,因此,在这种情况下,“将链接添加到我的链接”。

我首先假设没有可用的方法来找出所有动词短语,所以我尝试使用pyparsing将树变成嵌套列表。我结束了

[['ROOT', ['SBARQ', ['WHADVP', ['WRB', 'how']], ['SQ', ['VBP', 'do'], ['NP', ['FW', 'i']], ['VP', ['VB', 'add'], ['NP', ['DT', 'a'], ['NN', 'link']], ['PP', ['TO', 'to'], ['NP', ['PRP$', 'my'], ['NNS', 'links']]]]]]]]

我可以尝试从中提取

['VP', ['VB', 'add'], ['NP', ['DT', 'a'], ['NN', 'link']], ['PP', ['TO', 'to'], ['NP', ['PRP$', 'my'], ['NNS', 'links']]]

但是,我仍然不确定如何在嵌套列表中进行搜索以找到VP。我想知道斯坦福解析器代码中是否有一种方法可以返回某种类型的短语。

我正在得到这样的解析树:

import os
from nltk.parse import stanford
os.environ['STANFORD_PARSER'] = '/home/rwitn/stanford-parser'
os.environ['STANFORD_MODELS'] = '/home/rwitn/stanford-parser'

parser = stanford.StanfordParser(model_path="/home/rwitn/stanford-parser/englishPCFG.ser.gz")
sentences = parser.raw_parse_sents(firstQuestionsDF['chat_firstQuestion'])
#print(sentences)

# GUI
counter = 0
for line in sentences:
    for sentence in line:
        firstQuestionsDF.loc[counter]['parseTree'] = str(sentence)
        counter += 1
        #sentence.draw()
        print(sentence)

然后我得到这样的嵌套列表:

from pyparsing import *
text = firstQuestionsDF.loc[2]['parseTree']
nestedExample = nestedExpr(opener='(', closer=')').parseString(text)
print(nestedExample.asList())

您认为最好从该句子中获取最终的动词短语,甚至只是一个动词短语列表?

0 个答案:

没有答案