我正在尝试在清除文本文件后在 python 中创建 wordcloud ,
我得到了所需的结果,即主要用于文本文件但无法绘制的单词。
我的代码:
import collections
from wordcloud import WordCloud
import matplotlib.pyplot as plt
file = open('example.txt', encoding = 'utf8' )
stopwords = set(line.strip() for line in open('stopwords'))
wordcount = {}
for word in file.read().split():
word = word.lower()
word = word.replace(".","")
word = word.replace(",","")
word = word.replace("\"","")
word = word.replace("“","")
if word not in stopwords:
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
d = collections.Counter(wordcount)
for word, count in d.most_common(10):
print(word , ":", count)
#wordcloud = WordCloud().generate(text)
#fig = plt.figure()
#fig.set_figwidth(14)
#fig.set_figheight(18)
#plt.imshow(wordcloud.recolor(color_func=grey_color, random_state=3))
#plt.title(title, color=fontcolor, size=30, y=1.01)
#plt.annotate(footer, xy=(0, -.025), xycoords='axes fraction', fontsize=infosize, color=fontcolor)
#plt.axis('off')
#plt.show()
修改: 使用以下代码:
绘制wordcloudwordcloud = WordCloud(background_color='white',
width=1200,
height=1000
).generate((d.most_common(10)))
plt.imshow(wordcloud)
plt.axis('off')
plt.show()
但获得TypeError: expected string or buffer
当我使用.generate(str(d.most_common(10)))
形成的 wordcloud 在几个单词后显示撇号(')符号
使用Jupyter Notebook | python3 | IPython的
答案 0 :(得分:1)
首先在以下脚本的当前文件夹中下载此文件Symbola.ttf。
架构文件:
file.txt Symbola.ttf my_word_cloud.py
<强> file.txt的:强>
foo buzz bizz foo buzz bizz foo buzz bizz foo buzz bizz foo buzz bizz
foo foo foo foo foo foo foo foo foo foo bizz bizz bizz bizz foo foo
<强> my_word_cloud.py:强>
import io
from collections import Counter
from os import path
import matplotlib.pyplot as plt
from wordcloud import WordCloud
d = path.dirname(__file__)
# It is important to use io.open to correctly load the file as UTF-8
text = io.open(path.join(d, 'file.txt')).read()
words = text.split()
print(Counter(words))
# Generate a word cloud image
# The Symbola font includes most emoji
font_path = path.join(d, 'Symbola.ttf')
word_cloud = WordCloud(font_path=font_path).generate(text)
# Display the generated image:
plt.imshow(word_cloud)
plt.axis("off")
plt.show()
结果:
Counter({'foo': 17, 'bizz': 9, 'buzz': 5})
看到很多其他的例子,我在这里为你创建了一个简单的例子:
答案 1 :(得分:-1)
most_common(x)
不是WordCloud的方法。但是,您可以传递参数
max_words =
这应该做你正在尝试的事情。