单独的垃圾邮件和火腿用于WordCloud可视化

时间:2018-04-07 15:40:14

标签: python-3.x pandas join spam-prevention word-cloud

我正在执行垃圾邮件检测,并希望在Wordcloud中单独显示垃圾邮件和火腿关键字。这是我的.csv文件。

data = pd.read_csv("spam.csv",encoding='latin-1')
data = data.rename(columns = {"v1":"label", "v2":"message"})
data = data.replace({"spam":"1","ham":"0"})

data.head()

这是我的WordCloud代码。我需要垃圾邮件的帮助。我无法生成正确的图表。

import matplotlib.pyplot as plt
from wordcloud import WordCloud 

spam_words = ' '.join(list(data[data['label'] == 1 ]['message']))
spam_wc = WordCloud(width = 512, height = 512).generate(spam_words)

plt.figure(figsize = (10,8), facecolor = 'k')
plt.imshow(spam_wc)
plt.axis('off')
plt.tight_layout(pad = 0)
plt.show()

1 个答案:

答案 0 :(得分:0)

问题是当前代码使用单字符字符串 "spam""ham"替换"1""0",但是您过滤了DataFrame基于与整数的比较1.将替换行更改为:

data = data.replace({"spam": 1, "ham": 0})