我正在使用来自nltk的Tree-package和python 2.7,我想用树的祖父节点从树中提取每个规则。 我有以下树
t = Tree('S', [Tree('NP', [Tree('D', ['the']), Tree('N', ['dog'])]), Tree('VP', [Tree('V', ['chased']), Tree('NP', [Tree('D', ['the']), Tree('N', ['cat'])])])])
和制作
t.productions
[S -> NP VP, NP -> D N, D -> 'the', N -> 'dog', VP -> V NP, V -> 'chased', NP -> D N, D -> 'the', N -> 'cat']
树的:
S
________|_____
| VP
| _____|___
NP | NP
___|___ | ___|___
D N V D N
| | | | |
the dog chased the cat
我想要的是表格上的内容:
[S -> NP VP, S ^ NP -> D N, NP ^ D -> 'the', NP ^ N -> 'dog'.......]
我看过ParentedTree类,但我不知道怎么用它来解决我的问题。
答案 0 :(得分:1)
您需要修改/覆盖制作方法。
代码:
from nltk.tree import Tree
from nltk.compat import string_types
from nltk.grammar import Production, Nonterminal
from nltk.tree import _child_names
def productions(t, parent):
if not isinstance(t._label, string_types):
raise TypeError('Productions can only be generated from trees having node labels that are strings')
# t._label ==> parent + " ^ " + t._label
prods = [Production(Nonterminal(parent + " ^ " + t._label), _child_names(t))]
for child in t:
if isinstance(child, Tree):
prods += productions(child, t._label)
return prods
t = Tree('S', [Tree('NP', [Tree('D', ['the']), Tree('N', ['dog'])]), Tree('VP', [Tree('V', ['chased']), Tree('NP', [Tree('D', ['the']), Tree('N', ['cat'])])])])
# To Add Parent of 'S' as 'Start'
# prods = productions(t, "Start")
# To Skip Parent of 'S'
prods = [Production(Nonterminal(t._label), _child_names(t))]
for child in t:
if isinstance(child, Tree):
prods += productions(child, t._label)
print prods
输出
[S -> NP VP, S ^ NP -> D N, NP ^ D -> 'the',
NP ^ N -> 'dog', S ^ VP -> V NP, VP ^ V -> 'chased',
VP ^ NP -> D N, NP ^ D -> 'the', NP ^ N -> 'cat']
有关详情,请查看productions
- here
nltk.tree
方法