在Python中识别Stanford Tregex中的模式

时间:2018-06-19 18:28:43

标签: nltk stanford-nlp

在使用StanfordNLPStanfordParserTregex解析文本时,我想确定一种特定的模式。可以使用nltknltk.RegexpParser产生所需的输出,如下所示:

nltk代码:

from nltk import word_tokenize, pos_tag
text = "New developments in the science of motion picture photography are not 
        abundant at this advanced stage of the game"
cp_pattern = r"""CP: {<NN|JJ|><NN|JJ>}"""
parser = nltk.RegexpParser(cp_pattern)
tree = parser.parse(pos_tag(word_tokenize(text)))
for subtree in tree.subtrees():
    if subtree.label() == 'CP':  
        print(str(subtree))

和输出:

(CP motion/NN picture/NN)
(CP advanced/JJ stage/NN)

以下代码使用StanfordParser标记和解析text

   from nltk.parse.stanford import StanfordParser    
   parse_jar = 'path to stanford parser jar'
   parse_model = 'path to stanford parser model'
   parser_st=StanfordParser(path_to_jar=parse_jar,path_to_models_jar=parse_model)
   parse_tree = list(parser_st.raw_parse(text))
   print(parse_tree)

打印parse_tree将为我提供以下输出:

[Tree('ROOT', [Tree('S', [Tree('NP', [Tree('NP', [Tree('NNP', ['New']), Tree('NNS', ['developments'])]), Tree('PP', [Tree('IN', ['in']), Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('NN', ['science'])]), Tree('PP', [Tree('IN', ['of']), Tree('NP', [Tree('NN', ['motion']), Tree('NN', ['picture']), Tree('NN', ['photography'])])])])])]), Tree('VP', [Tree('VBP', ['are']), Tree('RB', ['not']), Tree('ADJP', [Tree('JJ', ['abundant']), Tree('PP', [Tree('IN', ['at']), Tree('NP', [Tree('NP', [Tree('DT', ['this']), Tree('VBN', ['advanced']), Tree('NN', ['stage'])]), Tree('PP', [Tree('IN', ['of']), Tree('NP', [Tree('DT', ['the']), Tree('NN', ['game'])])])])])])])])])]

现在,我想知道如何在cp_pattern中定义自己想要的模式StanfordParser并像使用nltk一样识别它吗?

0 个答案:

没有答案