删除python中的情绪分析中的标点符号

时间:2016-04-16 13:57:23

标签: python sentiment-analysis

我有以下代码。它工作得很好但是当我用逗号,句号等添加句子时会出现问题。我已经研究过并且可以看到strip()作为修复它的潜在选择吗?我无法看到在哪里添加它并尝试了但只是错误后出错!

由于

sent_analysis = {"beer": 10, "wine":13,"spirit": 11,"cider":16,"shot":16}

def sentiment_analysis(dic, text):
    split_text = text.split()
    result = 0.00
    for i in split_text:
        if i in dic:
            result+= dic[i]
    return result


print sentiment_analysis(sent_analysis,"the beer, wine and cider were    great")
print sentiment_analysis(sent_analysis,"the beer and the wine were great")

1 个答案:

答案 0 :(得分:1)

正则表达式可用于从字符串中删除所有非字母数字字符。在下面的代码中,^ \ w \ s匹配任何不(如^所示)a-z,A-Z,0-9和空格,并删除它们。 return语句迭代分割字符串,查找任何匹配项,将其添加到列表中,然后返回这些数字的总和。

Regex \s

Regex \w

import re
sent_analysis = {"beer": 10, "wine":13,"spirit": 11,"cider":16,"shot":16}

def sentiment_analysis(dic, text):
    result = 0.00
    s = re.sub(r'[^\w\s]','',text)
    return sum([dic[x] for x in s.split() if x in dic])

print(sentiment_analysis(sent_analysis,"the beer,% wine &*and cider @were great"))

输出: 39

这将占大多数标点符号,如示例字符串中添加的许多不同标点所示。