如何在Python NLTK中检查某个标签?

时间:2014-12-16 04:06:00

标签: python nltk

我一直试图检查一个标签,看看它是否是一个'NNP'已经有一段时间了。

for key in words:
        temp.append(words[key])
        tagger = [key]
        tag = nltk.pos_tag(tagger)
        x = str(tag[0][1].strip())
        print(x is 'NNP')

代码应该做的是循环几个键并检查标签是否为NNP。实际上,只要标签是NNP,我的print语句就会输出False。我使用type(tag [0] [1])检查它是否是str,是的。我也剥离了字符串,我决定使用str函数来确保它是一个字符串。似乎没什么用。是否有我应该使用的内置NLTK功能或任何其他建议?

2 个答案:

答案 0 :(得分:3)

比较字符串时,您应始终使用==运算符代替is

print(x == 'NNP')

使用is比较字符串对象本身的标识,同时==检查它们是等效还是相等。

例如:

>>> import nltk
>>> tag = nltk.pos_tag(['Google'])
>>> tag
[('Google', 'NNP')]
>>> tag[0][1]
'NNP'
>>> tag[0][1] is 'NNP'
False
>>> tag[0][1] == 'NNP'
True

答案 1 :(得分:2)

这是POS标签检查的惯用途:

>>> from nltk import pos_tag, word_tokenize
>>> text = 'Google is a friend of Facebook and Yahoo shouts at Microsoft because Stackoverflow is giving out hats.'
>>> for word, pos in pos_tag(word_tokenize(text)):
...     print word, pos
... 
Google NNP
is VBZ
a DT
friend NN
of IN
Facebook NNP
and CC
Yahoo NNP
shouts NNS
at IN
Microsoft NNP
because IN
Stackoverflow NNP
is VBZ
giving VBG
out RP
hats NNS
. .
>>> for word, pos in pos_tag(word_tokenize(text)):
...     if pos == 'NNP':
...             print word
... 
Google
Facebook
Yahoo
Microsoft
Stackoverflow

使用列表理解:

>>> [word for word, pos in pos_tag(word_tokenize(text)) if pos == 'NNP']
['Google', 'Facebook', 'Yahoo', 'Microsoft', 'Stackoverflow']