我使用以下方法对句子进行了分块:
grammar = '''
NP:
{<DT>*(<NN.*>|<JJ.*>)*<NN.*>}
NVN:
{<NP><VB.*><NP>}
'''
chunker = nltk.chunk.RegexpParser(grammar)
tree = chunker.parse(tagged)
print tree
结果如下:
(S
(NVN
(NP The_Pigs/NNS)
are/VBP
(NP a/DT Bristol-based/JJ punk/NN rock/NN band/NN))
that/WDT
formed/VBN
in/IN
1977/CD
./.)
但是现在我一直在试图弄清楚如何导航。我希望能够找到NVN子树,并访问左侧名词短语(“The_Pigs”),动词(“are”)和右侧名词短语(“基于布里斯托尔的朋克摇滚乐队”) 。我该怎么做?
答案 0 :(得分:14)
尝试:
ROOT = 'ROOT'
tree = ...
def getNodes(parent):
for node in parent:
if type(node) is nltk.Tree:
if node.label() == ROOT:
print "======== Sentence ========="
print "Sentence:", " ".join(node.leaves())
else:
print "Label:", node.label()
print "Leaves:", node.leaves()
getNodes(node)
else:
print "Word:", node
getNodes(tree)
答案 1 :(得分:8)
当然,您可以编写自己的深度优先搜索...但有一种更简单(更好)的方式。如果您希望每个子树都以NVM为根,请使用Tree的子树方法和定义的过滤器参数。
>>> print t
(S
(NVN
(NP The_Pigs/NNS)
are/VBP
(NP a/DT Bristol-based/JJ punk/NN rock/NN band/NN))
that/WDT
formed/VBN
in/IN
1977/CD
./.)
>>> for i in t.subtrees(filter=lambda x: x.node == 'NVN'):
... print i
...
(NVN
(NP The_Pigs/NNS)
are/VBP
(NP a/DT Bristol-based/JJ punk/NN rock/NN band/NN))
答案 2 :(得分:5)
试试这个:
for a in tree:
if type(a) is nltk.Tree:
if a.node == 'NVN': # This climbs into your NVN tree
for b in a:
if type(b) is nltk.Tree and b.node == 'NP':
print b.leaves() # This outputs your "NP"
else:
print b # This outputs your "VB.*"
输出:
[('The_Pigs','NNS')]
('是','VBP')
[('a','DT'),('Bristol-based','JJ'),('punk','NN'),('rock','NN'),('band' ,'NN')]
答案 3 :(得分:5)
这是一个代码示例,用于生成带有标签的所有子树&#39; NP&#39;
def filt(x):
return x.label()=='NP'
for subtree in t.subtrees(filter = filt): # Generate all subtrees
print subtree
对于兄弟姐妹,你可能想看看方法ParentedTree.left_siblings()
有关详细信息,请参阅以下链接。
http://www.nltk.org/howto/tree.html #some基本用法和示例 http://nbviewer.ipython.org/github/gmonce/nltk_parsing/blob/master/1.%20NLTK%20Syntax%20Trees.ipynb #a notebook使用这些方法
http://www.nltk.org/_modules/nltk/tree.html #all api with source