Question

我正在建立一个简单的分类器来判断句子是否为正。这就是我使用textblob训练分类器的方法。

train = [
     'i love your website', 'pos',
     'i really like your site', 'pos',
     'i dont like your website', 'neg',
     'i dislike your site', 'neg
]

cl.NaiveBayesClassifier(train)

#im clasifying text from twitter using tweepy and it goes like this and 
stored into the databse and using the django to save me doing all the hassle 
of  the backend

class StdOutListener(StreamListener)
def __init__(self)
    self.raw_tweets = []
    self.raw_teets.append(jsin.loads(data)
def on_data(self, data):
    tweets = Htweets() # connection to the database
    for x in self.raw_data:
        tweets.tweet_text = x['text']

        cl.classify(x['text'])

        if classify(x['text]) == 'pos'
            tweets.verdict = 'pos'
        elif classify(x['text]) == 'neg':
             tweets.verdict = 'neg'
        else:
             tweets.verdict = 'normal'

逻辑似乎很简单但是当我训练分类器哪一个是正面或负面时，它应该将判决与推文一起保存到数据库中。

但这似乎并非如此，我一直在以多种方式改变逻辑，但仍然不成功。问题是如果推文是肯定的还是否定的，那么算法确实认识到它们是。

但是我希望它保存'正常'，如果它们不是，它不是这样做的。我认识到分类器只识别正面或负面的两件事，但当然它也应该确定文本是否属于这一类别。

使用textblob时如何实现。样本替代逻辑和建议将非常感谢。

Answer 1

分类总会给出最大概率的答案，因此您应使用prob_classify方法获得类别标签的概率分布。在观察概率分布并设置适当的置信度阈值时，您将通过良好的训练集开始获得“中性”分类。以最小训练集来反映概念的示例，对于实际使用，您应使用大型训练集：

>>> train
[('I love this sandwich.', 'pos'), ('this is an amazing place!', 'pos'), ('I feel very good about these beers.', 'pos'), ('this is my best work.', 'pos'), ('what an awesome view', 'pos'), ('I do not like this restaurant', 'neg'), ('I am tired of this stuff.', 'neg'), ("I can't deal with this", 'neg'), ('he is my sworn enemy!', 'neg'), ('my boss is horrible.', 'neg')]
>>> from pprint import pprint
>>> pprint(train)
[('I love this sandwich.', 'pos'),
 ('this is an amazing place!', 'pos'),
 ('I feel very good about these beers.', 'pos'),
 ('this is my best work.', 'pos'),
 ('what an awesome view', 'pos'),
 ('I do not like this restaurant', 'neg'),
 ('I am tired of this stuff.', 'neg'),
 ("I can't deal with this", 'neg'),
 ('he is my sworn enemy!', 'neg'),
 ('my boss is horrible.', 'neg')]
>>> train2 = [('science is a subject','neu'),('this is horrible food','neg'),('glass has water','neu')]
>>> train = train+train2
>>> from textblob.classifiers import NaiveBayesClassifier
>>> cl = NaiveBayesClassifier(train)
>>> prob_dist = cl.prob_classify("I had a horrible day,I am tired")
>>> (prob_dist.prob('pos'),prob_dist.prob('neg'),prob_dist.prob('neu'))
(0.01085221171283812, 0.9746799258978173, 0.014467862389343378)
>>> 
>>> prob_dist = cl.prob_classify("This is a subject")
>>> (prob_dist.prob('pos'),prob_dist.prob('neg'),prob_dist.prob('neu'))
(0.10789848368588585, 0.14908905046805337, 0.7430124658460614)

Textblob逻辑帮助。 NaiveBayesClassifier

1 个答案: