使用python创建wordcloud

时间:2017-06-25 20:59:38

标签: python matplotlib plot word-cloud

我正在尝试在清除文本文件后在 python 中创建 wordcloud

我得到了所需的结果,即主要用于文本文件但无法绘制的单词。

我的代码:

import collections
from wordcloud import WordCloud
import matplotlib.pyplot as plt

file = open('example.txt', encoding = 'utf8' )
stopwords = set(line.strip() for line in open('stopwords'))
wordcount = {}

for word in file.read().split():
    word = word.lower()
    word = word.replace(".","")
    word = word.replace(",","")
    word = word.replace("\"","")
    word = word.replace("“","")
    if word not in stopwords:
        if word not in wordcount:
            wordcount[word] = 1
        else:
            wordcount[word] += 1

d = collections.Counter(wordcount)
for word, count in d.most_common(10):
    print(word , ":", count)

#wordcloud = WordCloud().generate(text)
#fig = plt.figure()
#fig.set_figwidth(14)
#fig.set_figheight(18)

#plt.imshow(wordcloud.recolor(color_func=grey_color, random_state=3))
#plt.title(title, color=fontcolor, size=30, y=1.01)
#plt.annotate(footer, xy=(0, -.025), xycoords='axes fraction', fontsize=infosize, color=fontcolor)
#plt.axis('off')
#plt.show()

修改: 使用以下代码

绘制wordcloud
wordcloud = WordCloud(background_color='white',
                          width=1200,
                          height=1000
                         ).generate((d.most_common(10)))


plt.imshow(wordcloud)
plt.axis('off')
plt.show()

但获得TypeError: expected string or buffer

当我使用.generate(str(d.most_common(10)))

尝试上述代码时

形成的 wordcloud 在几个单词后显示撇号(')符号

  

使用Jupyter Notebook | python3 | IPython的

2 个答案:

答案 0 :(得分:1)

首先在以下脚本的当前文件夹中下载此文件Symbola.ttf

架构文件:

file.txt Symbola.ttf my_word_cloud.py

<强> file.txt的:

foo buzz bizz foo buzz bizz foo buzz bizz foo buzz bizz foo buzz bizz
foo foo foo foo foo foo foo foo foo foo bizz bizz bizz bizz foo foo

<强> my_word_cloud.py:

import io
from collections import Counter
from os import path

import matplotlib.pyplot as plt
from wordcloud import WordCloud

d = path.dirname(__file__)

# It is important to use io.open to correctly load the file as UTF-8
text = io.open(path.join(d, 'file.txt')).read()

words = text.split()
print(Counter(words))

# Generate a word cloud image
# The Symbola font includes most emoji
font_path = path.join(d, 'Symbola.ttf')
word_cloud = WordCloud(font_path=font_path).generate(text)

# Display the generated image:
plt.imshow(word_cloud)
plt.axis("off")
plt.show()

结果:

Counter({'foo': 17, 'bizz': 9, 'buzz': 5})

word cloud

看到很多其他的例子,我在这里为你创建了一个简单的例子:

https://github.com/amueller/word_cloud/tree/master/examples

答案 1 :(得分:-1)

most_common(x)不是WordCloud的方法。但是,您可以传递参数

max_words = 

这应该做你正在尝试的事情。