当我从nltk执行stanford解析器时,我得到以下结果。
(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))
但我需要表格
S -> VP
VP -> VB NP ADVP
VB -> get
PRP -> me
RB -> now
如何使用递归函数获得此结果。 有内置功能吗?
答案 0 :(得分:5)
首先导航树,参见How to iterate through all nodes of a tree?和How to navigate a nltk.tree.Tree?:
>>> from nltk.tree import Tree
>>> bracket_parse = "(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))"
>>> ptree = Tree.fromstring(bracket_parse)
>>> ptree
Tree('S', [Tree('VP', [Tree('VB', ['get']), Tree('NP', [Tree('PRP', ['me'])]), Tree('ADVP', [Tree('RB', ['now'])])])])
>>> for subtree in ptree.subtrees():
... print subtree
...
(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))
(VP (VB get) (NP (PRP me)) (ADVP (RB now)))
(VB get)
(NP (PRP me))
(PRP me)
(ADVP (RB now))
(RB now)
您正在寻找的是https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L341:
>>> ptree.productions()
[S -> VP, VP -> VB NP ADVP, VB -> 'get', NP -> PRP, PRP -> 'me', ADVP -> RB, RB -> 'now']
请注意,Tree.productions()
会返回Production
个对象,请参阅https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L22和https://github.com/nltk/nltk/blob/develop/nltk/grammar.py#L236。
如果你想要一个字符串形式的语法规则,你可以这样做:
>>> for rule in ptree.productions():
... print rule
...
S -> VP
VP -> VB NP ADVP
VB -> 'get'
NP -> PRP
PRP -> 'me'
ADVP -> RB
RB -> 'now'
或
>>> rules = [str(p) for p in ptree.productions()]
>>> rules
['S -> VP', 'VP -> VB NP ADVP', "VB -> 'get'", 'NP -> PRP', "PRP -> 'me'", 'ADVP -> RB', "RB -> 'now'"]