我有以下代码。它工作得很好但是当我用逗号,句号等添加句子时会出现问题。我已经研究过并且可以看到strip()作为修复它的潜在选择吗?我无法看到在哪里添加它并尝试了但只是错误后出错!
由于
sent_analysis = {"beer": 10, "wine":13,"spirit": 11,"cider":16,"shot":16}
def sentiment_analysis(dic, text):
split_text = text.split()
result = 0.00
for i in split_text:
if i in dic:
result+= dic[i]
return result
print sentiment_analysis(sent_analysis,"the beer, wine and cider were great")
print sentiment_analysis(sent_analysis,"the beer and the wine were great")
答案 0 :(得分:1)
正则表达式可用于从字符串中删除所有非字母数字字符。在下面的代码中,^ \ w \ s匹配任何不(如^所示)a-z,A-Z,0-9和空格,并删除它们。 return语句迭代分割字符串,查找任何匹配项,将其添加到列表中,然后返回这些数字的总和。
import re
sent_analysis = {"beer": 10, "wine":13,"spirit": 11,"cider":16,"shot":16}
def sentiment_analysis(dic, text):
result = 0.00
s = re.sub(r'[^\w\s]','',text)
return sum([dic[x] for x in s.split() if x in dic])
print(sentiment_analysis(sent_analysis,"the beer,% wine &*and cider @were great"))
输出: 39
这将占大多数标点符号,如示例字符串中添加的许多不同标点所示。