如果我有这样的字符串:
text = "They refuse to permit us."
txt = nltk.word_tokenize(text)
如果我打印POS标签,这个; nltk.pos_tag(txt)
我得到了
[('他们','PRP'),('拒绝','VBP'),('to','TO'),('permit','VB'),('us',' PRP')]
我怎样才能打印出来:
['PRP','VBP','TO','VB','PRP']
答案 0 :(得分:2)
你有一个元组列表,你应该遍历它以获得每个元组的第二个元素。
>>> tagged = nltk.pos_tag(txt)
>>> tags = [ e[1] for e in tagged]
>>> tags
['PRP', 'VBP', 'TO', 'VB', 'PRP']
答案 1 :(得分:1)
查看Unpacking a list / tuple of pairs into two lists / tuples
>>> from nltk import pos_tag, word_tokenize
>>> text = "They refuse to permit us."
>>> tagged_text = pos_tag(word_tokenize(text))
>>> tokens, pos = zip(*tagged_text)
>>> pos
('PRP', 'VBP', 'TO', 'VB', 'PRP', '.')
可能在某些时候你会发现POS标签很慢,你需要这样做(见Slow performance of POS tagging. Can I do some kind of pre-warming?):
>>> from nltk import pos_tag, word_tokenize
>>> from nltk.tag import PerceptronTagger
>>> tagger = PerceptronTagger()
>>> text = "They refuse to permit us."
>>> tagged_text = tagger.tag(word_tokenize(text))
>>> tokens, pos = zip(*tagged_text)
>>> pos
('PRP', 'VBP', 'TO', 'VB', 'PRP', '.')
答案 2 :(得分:0)
你可以像< - p>一样迭代
print [x[1] for x in nltk.pos_tag(txt)]