交换树的标签在NLTK

时间:2015-03-10 12:00:48

标签: python python-2.7 parsing tree nltk

我使用NLTK解析了这个带标签的句子RegexpParser狗追逐黑猫并使用了以下grammar

tagged_ = [('the', 'DT'), ('dog', 'NN'), ('chased', 'VBD'), ('the', 'DT'), ('black', 'JJ'), ('cat', 'NN')]

grammar = """NP: {<DT>?<JJ>*<NN>} VP: {<MD>?<VBD>}""" cp = nltk.RegexpParser(grammar) result = cp.parse(tagged_) print(result) result.draw()

这是print(result)result.draw()

的输出

(S (NP the/DT dog/NN) (VP chased/VBD) (NP the/DT black/JJ cat/NN)) Tree

现在我想重新排序(VP chased/VBD)(NP the/DT dog/NN)所交换的树叶:

S (VP chased/VBD) (NP the/DT dog/NN) (NP the/DT black/JJ cat/NN))然后显示['chased','the','dog','the','black','cat']。有什么办法吗?

1 个答案:

答案 0 :(得分:0)

您可以将nltk.Tree对象视为两个值的元组。第一个值是根节点的名称,第二个值是包含子树或叶子的列表。您可以通过在根列表中附加子树来构建复杂的树:

>>> from nltk import Tree
>>> tree = Tree('S', [])
>>> np = Tree('NP', ['The', 'dog'])
>>> tree.append(np)
>>> vp = Tree('VP', ['barks'])
>>> tree.append(vp)
>>> print tree
(S (NP the dog) (VP barks))

您可以按tree.subtrees()

遍历所有子树
>>> for sub in tree.subtrees():
...     print sub
(S (NP the dog) (VP barks) 
(NP the dog)
(VP barks)

如何看到该方法输出所有子树,即在复杂的树中,您获得子树,子子树,子子子树......所以在您的情况下,您应该通过第一棵树的切片更好地获取电平:

>>> new = Tree('S', [])
>>> for i in xrange(len(tree)):
...     if tree[i].label() == 'VP':
...         new.insert(0, tree[i])
...     else:
...         new.append(tree[i])

>>> print new
(S (VP barks) (NP the dog))