NLTK中没有pos_tag的ne_chunk

时间:2017-05-29 07:42:58

标签: python tree tags nltk chunking

我试图在nltk中使用ne_chunk和pos_tag来判断句子。

from nltk import tag
from nltk.tag import pos_tag
from nltk.tree import Tree
from nltk.chunk import ne_chunk

sentence = "Michael and John is reading a booklet in a library of Jakarta"
tagged_sent = pos_tag(sentence.split())

print_chunk = [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]

print print_chunk

这就是结果:

[Tree('GPE', [('Michael', 'NNP')]), Tree('PERSON', [('John', 'NNP')]), Tree('GPE', [('Jakarta', 'NNP')])]

我的问题是,是否可能不包括pos_tag(如上面的NNP)并且只包括Tree' GPE' PERSON'? 什么' GPE'装置

提前致谢

2 个答案:

答案 0 :(得分:4)

命名实体chunker将为您提供包含块和标记的树。你不能改变它,但你可以拿出标签。从您的tagged_sent开始:

chunks = nltk.ne_chunk(tagged_sent)
simple = []
for elt in chunks:
    if isinstance(elt, Tree):
        simple.append(Tree(elt.label(), [ word for word, tag in elt ]))
    else:
        simple.append( elt[0] )

如果您只想要块,请省略上面的else:子句。您可以调整代码以任何方式包装块。我使用nltk Tree将更改保持在最低限度。请注意,一些块由多个单词组成(尝试添加"纽约"到您的示例),因此块的内容必须是列表,而不是单个元素。

PS。 " GPE"代表"地缘政治实体" (显然是一个错误的错误)。您可以看到"常用标签的列表"在nltk书中,here

答案 1 :(得分:2)

最有可能是对https://stackoverflow.com/a/31838373/610569上带代码的代码稍作修改。

  

是否可以不包含pos_tag(如上面的NNP)并且只包括Tree' GPE' PERSON'?

是的,只需遍历Tree对象=)参见How to Traverse an NLTK Tree object?

>>> from nltk import Tree, pos_tag, ne_chunk
>>> sentence = "Michael and John is reading a booklet in a library of Jakarta"
>>> tagged_sent = ne_chunk(pos_tag(sentence.split()))
>>> tagged_sent
Tree('S', [Tree('GPE', [('Michael', 'NNP')]), ('and', 'CC'), Tree('PERSON', [('John', 'NNP')]), ('is', 'VBZ'), ('reading', 'VBG'), ('a', 'DT'), ('booklet', 'NN'), ('in', 'IN'), ('a', 'DT'), ('library', 'NN'), ('of', 'IN'), Tree('GPE', [('Jakarta', 'NNP')])])

>>> from nltk.sem.relextract import NE_CLASSES
>>> ace_tags = NE_CLASSES['ace']

>>> for node in tagged_sent:
...     if type(node) == Tree and node.label() in ace_tags:
...         words, tags = zip(*node.leaves())
...         print node.label() + '\t' +  ' '.join(words)
... 
GPE Michael
PERSON  John
GPE Jakarta
  

什么' GPE'装置

GPE意味着"地缘政治实体"