这可能是一个愚蠢的问题,但是如何通过解析树迭代作为NLP解析器(如Stanford NLP)的输出?它是所有嵌套的括号,既不是array
也不是dictionary
或我使用的任何其他收藏类型。
(ROOT\n (S\n (PP (IN As)\n (NP (DT an) (NN accountant)))\n (NP (PRP I))\n (VP (VBP want)\n (S\n (VP (TO to)\n (VP (VB make)\n (NP (DT a) (NN payment))))))))
答案 0 :(得分:4)
斯坦福分析器的这种特定输出格式称为"括号内的分析(树)"。它应该被读作带有
的图形ROOT
(在这种情况下,您可以将其视为有向无环图(DAG),因为它是单向和非循环的)
那里有图书馆可以阅读括号内的解析,例如在NLTK
' nltk.tree.Tree
http://www.nltk.org/howto/tree.html)中:
>>> from nltk.tree import Tree
>>> output = '(ROOT (S (PP (IN As) (NP (DT an) (NN accountant))) (NP (PRP I)) (VP (VBP want) (S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))'
>>> parsetree = Tree.fromstring(output)
>>> print parsetree
(ROOT
(S
(PP (IN As) (NP (DT an) (NN accountant)))
(NP (PRP I))
(VP
(VBP want)
(S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))
>>> parsetree.pretty_print()
ROOT
|
S
______________________|________
| | VP
| | ________|____
| | | S
| | | |
| | | VP
| | | ________|___
PP | | | VP
___|___ | | | ________|___
| NP NP | | | NP
| ___|______ | | | | ___|_____
IN DT NN PRP VBP TO VB DT NN
| | | | | | | | |
As an accountant I want to make a payment
>>> parsetree.leaves()
['As', 'an', 'accountant', 'I', 'want', 'to', 'make', 'a', 'payment']
答案 1 :(得分:3)
请注意,如果您对树中的特定节点(由类似regex的规则标识)感兴趣,您可以使用这个非常非常简单的类来使用类似regex的匹配器提取所有这些节点:
http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/tregex/TregexPattern.html