从单列Pandas数据帧生成单词云

时间:2017-04-25 09:12:24

标签: python pandas dataframe word-cloud

我有一个Pandas数据框,其中有一列:犯罪类型。该列包含16个不同的"类别"犯罪,我希望将其视为文字云,根据数据框中的频率调整字数。

enter image description here

我尝试使用以下代码执行此操作:

将数据输入:

fields = ['Crime type']

text2 = pd.read_csv('allCrime.csv', usecols=fields)

生成单词云:

wordcloud2 = WordCloud().generate(text2)
# Generate plot
plt.imshow(wordcloud2)
plt.axis("off")
plt.show()

然而,我收到此错误:

TypeError: expected string or bytes-like object

我能够使用以下代码从完整数据集创建一个早期的词云,但我希望词云只能生成特定列中的单词,犯罪类型' (' allCrime.csv'包含大约13列):

text = open('allCrime.csv').read()
wordcloud = WordCloud().generate(text)
# Generate plot
plt.imshow(wordcloud)
plt.axis("off")
plt.show()

我是Python和Pandas的新手(并且通常编码!)所以感谢所有的帮助。

5 个答案:

答案 0 :(得分:12)

问题在于,您使用的WordCloud.generate方法需要一个字符串,它会计算单词实例,但您提供pd.Series

根据您想要生成的单词云,您可以执行以下操作:

  1. wordcloud2 = WordCloud().generate(' '.join(text2['Crime Type'])),它会连接数据框列中的所有单词,然后计算所有实例。

  2. 使用WordCloud.generate_from_frequencies手动传递单词的计算频率。

答案 1 :(得分:3)

您可以在删除单个列的所有后缀词时生成词云。 假设您的数据框为df,​​col名称为comment,那么以下代码可以提供帮助:

#Final word cloud after all the cleaning and pre-processing
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS
comment_words = ' '
stopwords = set(STOPWORDS) 

# iterate through the csv file 
for val in df.comment: 

   # typecaste each val to string 
   val = str(val) 

   # split the value 
   tokens = val.split() 

# Converts each token into lowercase 
for i in range(len(tokens)): 
    tokens[i] = tokens[i].lower() 

for words in tokens: 
    comment_words = comment_words + words + ' '


wordcloud = WordCloud(width = 800, height = 800, 
            background_color ='white', 
            stopwords = stopwords, 
            min_font_size = 10).generate(comment_words) 

# plot the WordCloud image                        
plt.figure(figsize = (8, 8), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 

plt.show() 

答案 2 :(得分:1)

V/FA: Activity resumed, time: 455508700
V/FA: Connection attempt already in progress
D/ViewRootImpl@560a619[dashboard]: setView = DecorView@ae2f610[dashboard] TM=true MM=false
D/ViewRootImpl@560a619[dashboard]: Relayout returned: old=[0,0][1080,2280] new=[0,0][1080,2280] req=(1080,2280)0 dur=11 res=0x7 s={true 540690239488} ch=true
D/OpenGLRenderer: createReliableSurface : 0x7e71cddc80, 0x7de3a65000
D/OpenGLRenderer: SurfaceChanged : 0x0 -> 0x7e109da080
E/ViewRootImpl: sendUserActionEvent() mView returned.
I/TextToSpeech: Connected to ComponentInfo{com.samsung.SMT/com.samsung.SMT.SamsungTTSService}
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getInt(Ljava/lang/Object;J)I (greylist, linking, allowed)
D/FA: Connected to remote service
V/FA: Processing queued up service tasks: 3
I/TextToSpeech: Set up connection to ComponentInfo{com.samsung.SMT/com.samsung.SMT.SamsungTTSService}
D/ViewRootImpl@560a619[dashboard]: Relayout returned: old=[0,0][1080,2280] new=[0,0][1080,2280] req=(1080,2280)0 dur=6 res=0x1 s={true 540690239488} ch=false
D/ViewRootImpl@560a619[dashboard]: MSG_WINDOW_FOCUS_CHANGED 1 1
D/InputMethodManager: prepareNavigationBarInfo() DecorView@ae2f610[dashboard]
getNavigationBarColor() -855310
D/InputMethodManager: prepareNavigationBarInfo() DecorView@ae2f610[dashboard]
getNavigationBarColor() -855310
V/InputMethodManager: Starting input: tba=com.mydrive ic=null mNaviBarColor -855310 mIsGetNaviBarColorSuccess true , NavVisible : true , NavTrans : false
D/InputMethodManager: startInputInner - Id : 0
I/InputMethodManager: startInputInner - mService.startInputOrWindowGainedFocus
D/InputTransport: Input channel destroyed: 'ClientS', fd=239
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getInt(Ljava/lang/Object;J)I (greylist, linking, allowed)
D/ViewRootImpl@560a619[dashboard]: MSG_RESIZED: frame=[0,0][1080,2280] ci=[0,108][0,126] vi=[0,108][0,126] or=1
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getObject(Ljava/lang/Object;J)Ljava/lang/Object; (greylist, linking, allowed)
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getLong(Ljava/lang/Object;J)J (greylist,core-platform-api, linking, allowed)
Accessing hidden method Lsun/misc/Unsafe;->putLong(Ljava/lang/Object;JJ)V (greylist, linking, allowed)
Accessing hidden method Lsun/misc/Unsafe;->putObject(Ljava/lang/Object;JLjava/lang/Object;)V (greylist, linking, allowed)
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getLong(Ljava/lang/Object;J)J (greylist,core-platform-api, linking, allowed)
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getLong(Ljava/lang/Object;J)J (greylist,core-platform-api, linking, allowed)
Accessing hidden method Lsun/misc/Unsafe;->getObject(Ljava/lang/Object;J)Ljava/lang/Object; (greylist, linking, allowed)
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getObject(Ljava/lang/Object;J)Ljava/lang/Object; (greylist, linking, allowed)
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getLong(Ljava/lang/Object;J)J (greylist,core-platform-api, linking, allowed)
Accessing hidden method Lsun/misc/Unsafe;->getObject(Ljava/lang/Object;J)Ljava/lang/Object; (greylist, linking, allowed)
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getObject(Ljava/lang/Object;J)Ljava/lang/Object; (greylist, linking, allowed)
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getObject(Ljava/lang/Object;J)Ljava/lang/Object; (greylist, linking, allowed)
V/FA: Screen exposed for less than 1000 ms. Event not sent. time: 470
V/FA: Activity paused, time: 455509171
D/ViewRootImpl@560a619[dashboard]: MSG_WINDOW_FOCUS_CHANGED 0 1
D/InputMethodManager: prepareNavigationBarInfo() DecorView@ae2f610[dashboard]
getNavigationBarColor() -855310
E/libprocessgroup: set_timerslack_ns write failed: Operation not permitted
D/InputTransport: Input channel destroyed: 'ClientS', fd=245
I/Ads: Ad failed to load : 3
I/Ads: Ad failed to load : 3
D/ViewRootImpl@560a619[dashboard]: stopped(true) old=false
D/OpenGLRenderer: SurfaceChanged : 0x7e109da080 -> 0x0
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getLong(Ljava/lang/Object;J)J (greylist,core-platform-api, linking, allowed)
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getObject(Ljava/lang/Object;J)Ljava/lang/Object; (greylist, linking, allowed)

答案 3 :(得分:1)

您需要创建一个串联的输入文本。这可以通过join函数来完成。

fields = ['Crime type']
text2 = pd.read_csv('allCrime.csv', usecols=fields)

text3 = ' '.join(text2['Crime Type'])
wordcloud2 = WordCloud().generate(text3)
# Generate plot
plt.imshow(wordcloud2)
plt.axis("off")
plt.show()

答案 4 :(得分:0)

使用以下方法可以轻松完成:

df = pd.read_csv('allCrime.csv')
data = df['Crime type'].value_counts().to_dict()
wc = WordCloud().generate_from_frequencies(data)

plt.imshow(wc)
plt.axis('off')
plt.show()