如何使用监督学习提高情感分析的准确性和准确性

时间:2019-04-05 19:06:43

标签: svm sentiment-analysis naivebayes sklearn-pandas kaggle

我已经训练了大约80MB的IMDB电影数据集,我正在对与电影相关的测试数据进行情感分析。 我尝试使用Navie bays和SVM(SK学习)进行情感分析,但仍然无法提高结果的准确性。

我尝试了这些方法 https://colab.research.google.com/drive/1yIMLlYhhl6QOj1wpjW-Zka0cLOYk7Jg7

https://medium.com/@media_73863/fasttext-sentiment-analysis-for-tweets-a-straightforward-guide-9a8c070449a2

但是我无法获得超过75%的准确度

#转义HTML字符

 tweet = BeautifulSoup(tweet).get_text()
    #Special case not handled previously.
    tweet = tweet.replace('\x92',"'")
    #Removal of hastags/account
    tweet = ' '.join(re.sub("(@[A-Za-z0-9]+)|(#[A-Za-z0-9]+)", " ", tweet).split())
    #Removal of address
    tweet = ' '.join(re.sub("(\w+:\/\/\S+)", " ", tweet).split())
    #Removal of Punctuation
    tweet = ' '.join(re.sub("[\.\,\!\?\:\;\-\=]", " ", tweet).split())
    #Lower case
    tweet = tweet.lower()
    #CONTRACTIONS source: https://en.wikipedia.org/wiki/Contraction_%28grammar%29
    CONTRACTIONS = load_dict_contractions()
    tweet = tweet.replace("’","'")
    words = tweet.split()
    reformed = [CONTRACTIONS[word] if word in CONTRACTIONS else word for word in words]
    tweet = " ".join(reformed)
    # Standardizing words
    tweet = ''.join(''.join(s)[:2] for _, s in itertools.groupby(tweet))
    #Deal with smileys
    #source: https://en.wikipedia.org/wiki/List_of_emoticons
    SMILEY = load_dict_smileys()  
    words = tweet.split()
    reformed = [SMILEY[word] if word in SMILEY else word for word in words]
    tweet = " ".join(reformed)
    #Deal with emojis
    tweet = emoji.demojize(tweet)
    #Strip accents
    tweet= strip_accents(tweet)
    tweet = tweet.replace(":"," ")
    tweet = ' '.join(tweet.split())

当前结果

precision    recall  f1-score   support

           0       0.74      0.75      0.75       814
           1       0.77      0.76      0.77       892

   micro avg       0.76      0.76      0.76      1706
   macro avg       0.76      0.76      0.76      1706
weighted avg       0.76      0.76      0.76      1706

预计最多可增加90%

链接到完整代码 https://drive.google.com/file/d/1dsnv86Rgu-NOXnLY1fbvRfN3Homb541K/view?usp=sharing

0 个答案:

没有答案