TextBlob和NLTK POS标记的准确性

时间:2019-03-24 18:19:12

标签: python python-3.x nlp nltk textblob

到目前为止,我的代码如下

Python 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:19:30) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import pyautogui

Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    import pyautogui
  File "C:\Python27\lib\site-packages\pyautogui\__init__.py", line 80, in <module>
    import pyscreeze
SyntaxError: 'return' with argument inside generator (__init__.py, line 168)
>>>

这是输出:

from textblob import TextBlob
class BrinBot:

    def __init__(self, message): #Accepts the message from the user as the argument
        parse(message)

class parse:
    def __init__(self, message):
        self.message = message
        blob = TextBlob(self.message)
        print(blob.tags)

BrinBot("Handsome Bob's dog is a beautiful Chihuahua")

我的问题是,显然TextBlob认为“ Handsome”是单数专有名词,这不正确,因为“ Handsome”被认为是形容词。有没有办法解决这个问题,我也在NLTK上尝试过,但是得到了相同的结果。

1 个答案:

答案 0 :(得分:0)

之所以发生这种情况,是因为“帅哥”的大写字母导致其被视为鲍勃名字的一部分。这不一定是不正确的分析,但如果要强制进行形容词分析,则可以像下面的text2和text4一样删除“ handsome”的大写字母。

text = "Handsome Bob's dog is a beautiful chihuahua"

BrinBot(text)
[('Handsome', 'NNP'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('beautiful', 'JJ'), ('Chihuahua', 'NNP')]

text2 = "handsome bob's dog is a beautiful chihuahua"

BrinBot(text2)
[('handsome', 'JJ'), ('bob', 'NN'), ("'s", 'POS'), ('dog', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('beautiful', 'JJ'), ('chihuahua', 'NN')]

text3 = "That beautiful chihuahua is handsome Bob's dog"

BrinBot(text3)
[('That', 'DT'), ('beautiful', 'JJ'), ('chihuahua', 'NN'), ('is', 'VBZ'), ('handsome', 'JJ'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN')]

text4 = "That beautiful chihuahua is Handsome Bob's dog"

BrinBot(text4)
[('That', 'DT'), ('beautiful', 'JJ'), ('chihuahua', 'NN'), ('is', 'VBZ'), ('Handsome', 'NNP'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN')]