使用nltk查找祖父节点

时间:2016-08-02 13:29:52

标签: python tree nltk

我正在使用来自nltk的Tree-package和python 2.7,我想用树的祖父节点从树中提取每个规则。 我有以下树

t = Tree('S', [Tree('NP', [Tree('D', ['the']), Tree('N', ['dog'])]), Tree('VP', [Tree('V', ['chased']), Tree('NP', [Tree('D', ['the']), Tree('N', ['cat'])])])])

和制作

   t.productions
   [S -> NP VP, NP -> D N, D -> 'the', N -> 'dog', VP -> V NP, V -> 'chased', NP -> D N, D -> 'the', N -> 'cat']
树的

               S               
       ________|_____           
      |              VP        
      |         _____|___       
      NP       |         NP    
   ___|___     |      ___|___   
  D       N    V     D       N 
  |       |    |     |       |  
 the     dog chased the     cat

我想要的是表格上的内容:

[S -> NP VP, S ^ NP -> D N, NP ^ D -> 'the', NP ^ N -> 'dog'.......]

我看过ParentedTree类,但我不知道怎么用它来解决我的问题。

1 个答案:

答案 0 :(得分:1)

您需要修改/覆盖制作方法

代码:

from nltk.tree import Tree
from nltk.compat import string_types
from nltk.grammar import Production, Nonterminal
from nltk.tree import _child_names

def productions(t, parent):
    if not isinstance(t._label, string_types):
        raise TypeError('Productions can only be generated from trees having node labels that are strings')

    # t._label ==> parent + " ^ " + t._label
    prods = [Production(Nonterminal(parent + " ^ " + t._label), _child_names(t))]
    for child in t:
        if isinstance(child, Tree):
            prods += productions(child, t._label)
    return prods


t = Tree('S', [Tree('NP', [Tree('D', ['the']), Tree('N', ['dog'])]), Tree('VP', [Tree('V', ['chased']), Tree('NP', [Tree('D', ['the']), Tree('N', ['cat'])])])])

# To Add Parent of 'S' as 'Start'
# prods = productions(t, "Start")

# To Skip Parent of 'S'
prods = [Production(Nonterminal(t._label), _child_names(t))]
for child in t:
    if isinstance(child, Tree):
        prods += productions(child, t._label)

print prods

输出

[S -> NP VP, S ^ NP -> D N, NP ^ D -> 'the', 
    NP ^ N -> 'dog', S ^ VP -> V NP, VP ^ V -> 'chased', 
    VP ^ NP -> D N, NP ^ D -> 'the', NP ^ N -> 'cat']

有关详情,请查看productions - here

nltk.tree方法