Question

关于NLTK树木的两个问题：

我能区分一棵树（句子）第一，第二，......子树吗？
如何使用子树叶子中的标签？

以下代码效果很好，

          for subtree in tree.subtrees(filter=lambda t: t.node == 'NP'):
            for attributes in subtree.leaves():
                print attributes

但它返回一个包含单词和标签的列表：

('noun', 'NN')
('verb', VBZ)

等等：我需要区分子树中不同类型的单词。 subtree.labels（）不存在。

类似的东西：

           for subtree in tree.subtrees(filter=lambda t: t.node == 'NP'):
            for attributes in subtree.leaves():
                if subtree.labels() == 'NN':
                  # do something with the nouns...

感谢提示

Answer 1

所以我用python做了。无论如何，如果有人有更好的想法......

         for subtree in tree.subtrees(filter=lambda t: t.node == 'NP' or t.node == 'NNS'):
            for attributes in subtree.leaves():
                (expression, tag) = attributes
                if tag == 'NN':
                    # do something with the nouns

Answer 2

我做了如下的事情从树中提取名词短语。

from itertools import groupby
[' '.join([t[0] for t,m in group]) for key, group in groupby(tree.pos(), lambda s: s[-1]=='NP') if key]

更一般而言，我们可以检查“组”内部的内容，并对组中的元素进行任何操作。例如，

[list(group]) for key, group in groupby(tree.pos(), lambda s: s[-1]=='NP') if key]

一旦我们知道“列表（组）”中包含的元素，我们就可以对其进行任何操作。

另一种方法是使用tree2conlltags。例如，

from nltk.chunk import tree2conlltags
from itertools import groupby

chunks = tree2conlltags(tree)

print(chunks)

results = [' '.join(word for word, pos, chunk in group).lower() for key, group in groupby(chunks, lambda s: s[-1]!='O') if key]

python NLTK解析子树

2 个答案: