如何在python中打印出标签

时间:2016-01-05 10:21:06

标签: python-2.7 nltk

如果我有这样的字符串:

text = "They refuse to permit us."

txt = nltk.word_tokenize(text)

如果我打印POS标签,这个; nltk.pos_tag(txt)我得到了

  

[('他们','PRP'),('拒绝','VBP'),('to','TO'),('permit','VB'),('us',' PRP')]

我怎样才能打印出来:

  

['PRP','VBP','TO','VB','PRP']

3 个答案:

答案 0 :(得分:2)

你有一个元组列表,你应该遍历它以获得每个元组的第二个元素。

>>> tagged = nltk.pos_tag(txt)
>>> tags =  [ e[1] for e in tagged]
>>> tags
['PRP', 'VBP', 'TO', 'VB', 'PRP'] 

答案 1 :(得分:1)

查看Unpacking a list / tuple of pairs into two lists / tuples

>>> from nltk import pos_tag, word_tokenize
>>> text = "They refuse to permit us."
>>> tagged_text = pos_tag(word_tokenize(text))
>>> tokens, pos = zip(*tagged_text)
>>> pos
('PRP', 'VBP', 'TO', 'VB', 'PRP', '.')

可能在某些时候你会发现POS标签很慢,你需要这样做(见Slow performance of POS tagging. Can I do some kind of pre-warming?):

>>> from nltk import pos_tag, word_tokenize
>>> from nltk.tag import PerceptronTagger
>>> tagger = PerceptronTagger()
>>> text = "They refuse to permit us."
>>> tagged_text = tagger.tag(word_tokenize(text))
>>> tokens, pos = zip(*tagged_text)
>>> pos
('PRP', 'VBP', 'TO', 'VB', 'PRP', '.')

答案 2 :(得分:0)

你可以像< - p>一样迭代

print [x[1] for x in nltk.pos_tag(txt)]