从解析结果中提取语法规则

时间:2015-10-15 05:59:18

标签: python recursion nltk stanford-nlp

当我从nltk执行stanford解析器时,我得到以下结果。

(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))

但我需要表格

S -> VP
VP -> VB NP ADVP
VB -> get
PRP -> me
RB -> now

如何使用递归函数获得此结果。 有内置功能吗?

1 个答案:

答案 0 :(得分:5)

首先导航树,参见How to iterate through all nodes of a tree?How to navigate a nltk.tree.Tree?

>>> from nltk.tree import Tree
>>> bracket_parse = "(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))"
>>> ptree = Tree.fromstring(bracket_parse)
>>> ptree
Tree('S', [Tree('VP', [Tree('VB', ['get']), Tree('NP', [Tree('PRP', ['me'])]), Tree('ADVP', [Tree('RB', ['now'])])])])
>>> for subtree in ptree.subtrees():
...     print subtree
... 
(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))
(VP (VB get) (NP (PRP me)) (ADVP (RB now)))
(VB get)
(NP (PRP me))
(PRP me)
(ADVP (RB now))
(RB now)

您正在寻找的是https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L341

>>> ptree.productions()
[S -> VP, VP -> VB NP ADVP, VB -> 'get', NP -> PRP, PRP -> 'me', ADVP -> RB, RB -> 'now']

请注意,Tree.productions()会返回Production个对象,请参阅https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L22https://github.com/nltk/nltk/blob/develop/nltk/grammar.py#L236

如果你想要一个字符串形式的语法规则,你可以这样做:

>>> for rule in ptree.productions():
...     print rule
... 
S -> VP
VP -> VB NP ADVP
VB -> 'get'
NP -> PRP
PRP -> 'me'
ADVP -> RB
RB -> 'now'

>>> rules = [str(p) for p in ptree.productions()]
>>> rules
['S -> VP', 'VP -> VB NP ADVP', "VB -> 'get'", 'NP -> PRP', "PRP -> 'me'", 'ADVP -> RB', "RB -> 'now'"]