我有一个Pandas数据框,其中有一列:犯罪类型。该列包含16个不同的"类别"犯罪,我希望将其视为文字云,根据数据框中的频率调整字数。
我尝试使用以下代码执行此操作:
将数据输入:
fields = ['Crime type']
text2 = pd.read_csv('allCrime.csv', usecols=fields)
生成单词云:
wordcloud2 = WordCloud().generate(text2)
# Generate plot
plt.imshow(wordcloud2)
plt.axis("off")
plt.show()
然而,我收到此错误:
TypeError: expected string or bytes-like object
我能够使用以下代码从完整数据集创建一个早期的词云,但我希望词云只能生成特定列中的单词,犯罪类型' (' allCrime.csv'包含大约13列):
text = open('allCrime.csv').read()
wordcloud = WordCloud().generate(text)
# Generate plot
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
我是Python和Pandas的新手(并且通常编码!)所以感谢所有的帮助。
答案 0 :(得分:12)
问题在于,您使用的WordCloud.generate
方法需要一个字符串,它会计算单词实例,但您提供pd.Series
。
根据您想要生成的单词云,您可以执行以下操作:
wordcloud2 = WordCloud().generate(' '.join(text2['Crime Type']))
,它会连接数据框列中的所有单词,然后计算所有实例。
使用WordCloud.generate_from_frequencies
手动传递单词的计算频率。
答案 1 :(得分:3)
您可以在删除单个列的所有后缀词时生成词云。 假设您的数据框为df,col名称为comment,那么以下代码可以提供帮助:
#Final word cloud after all the cleaning and pre-processing
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS
comment_words = ' '
stopwords = set(STOPWORDS)
# iterate through the csv file
for val in df.comment:
# typecaste each val to string
val = str(val)
# split the value
tokens = val.split()
# Converts each token into lowercase
for i in range(len(tokens)):
tokens[i] = tokens[i].lower()
for words in tokens:
comment_words = comment_words + words + ' '
wordcloud = WordCloud(width = 800, height = 800,
background_color ='white',
stopwords = stopwords,
min_font_size = 10).generate(comment_words)
# plot the WordCloud image
plt.figure(figsize = (8, 8), facecolor = None)
plt.imshow(wordcloud)
plt.axis("off")
plt.tight_layout(pad = 0)
plt.show()
答案 2 :(得分:1)
V/FA: Activity resumed, time: 455508700
V/FA: Connection attempt already in progress
D/ViewRootImpl@560a619[dashboard]: setView = DecorView@ae2f610[dashboard] TM=true MM=false
D/ViewRootImpl@560a619[dashboard]: Relayout returned: old=[0,0][1080,2280] new=[0,0][1080,2280] req=(1080,2280)0 dur=11 res=0x7 s={true 540690239488} ch=true
D/OpenGLRenderer: createReliableSurface : 0x7e71cddc80, 0x7de3a65000
D/OpenGLRenderer: SurfaceChanged : 0x0 -> 0x7e109da080
E/ViewRootImpl: sendUserActionEvent() mView returned.
I/TextToSpeech: Connected to ComponentInfo{com.samsung.SMT/com.samsung.SMT.SamsungTTSService}
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getInt(Ljava/lang/Object;J)I (greylist, linking, allowed)
D/FA: Connected to remote service
V/FA: Processing queued up service tasks: 3
I/TextToSpeech: Set up connection to ComponentInfo{com.samsung.SMT/com.samsung.SMT.SamsungTTSService}
D/ViewRootImpl@560a619[dashboard]: Relayout returned: old=[0,0][1080,2280] new=[0,0][1080,2280] req=(1080,2280)0 dur=6 res=0x1 s={true 540690239488} ch=false
D/ViewRootImpl@560a619[dashboard]: MSG_WINDOW_FOCUS_CHANGED 1 1
D/InputMethodManager: prepareNavigationBarInfo() DecorView@ae2f610[dashboard]
getNavigationBarColor() -855310
D/InputMethodManager: prepareNavigationBarInfo() DecorView@ae2f610[dashboard]
getNavigationBarColor() -855310
V/InputMethodManager: Starting input: tba=com.mydrive ic=null mNaviBarColor -855310 mIsGetNaviBarColorSuccess true , NavVisible : true , NavTrans : false
D/InputMethodManager: startInputInner - Id : 0
I/InputMethodManager: startInputInner - mService.startInputOrWindowGainedFocus
D/InputTransport: Input channel destroyed: 'ClientS', fd=239
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getInt(Ljava/lang/Object;J)I (greylist, linking, allowed)
D/ViewRootImpl@560a619[dashboard]: MSG_RESIZED: frame=[0,0][1080,2280] ci=[0,108][0,126] vi=[0,108][0,126] or=1
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getObject(Ljava/lang/Object;J)Ljava/lang/Object; (greylist, linking, allowed)
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getLong(Ljava/lang/Object;J)J (greylist,core-platform-api, linking, allowed)
Accessing hidden method Lsun/misc/Unsafe;->putLong(Ljava/lang/Object;JJ)V (greylist, linking, allowed)
Accessing hidden method Lsun/misc/Unsafe;->putObject(Ljava/lang/Object;JLjava/lang/Object;)V (greylist, linking, allowed)
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getLong(Ljava/lang/Object;J)J (greylist,core-platform-api, linking, allowed)
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getLong(Ljava/lang/Object;J)J (greylist,core-platform-api, linking, allowed)
Accessing hidden method Lsun/misc/Unsafe;->getObject(Ljava/lang/Object;J)Ljava/lang/Object; (greylist, linking, allowed)
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getObject(Ljava/lang/Object;J)Ljava/lang/Object; (greylist, linking, allowed)
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getLong(Ljava/lang/Object;J)J (greylist,core-platform-api, linking, allowed)
Accessing hidden method Lsun/misc/Unsafe;->getObject(Ljava/lang/Object;J)Ljava/lang/Object; (greylist, linking, allowed)
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getObject(Ljava/lang/Object;J)Ljava/lang/Object; (greylist, linking, allowed)
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getObject(Ljava/lang/Object;J)Ljava/lang/Object; (greylist, linking, allowed)
V/FA: Screen exposed for less than 1000 ms. Event not sent. time: 470
V/FA: Activity paused, time: 455509171
D/ViewRootImpl@560a619[dashboard]: MSG_WINDOW_FOCUS_CHANGED 0 1
D/InputMethodManager: prepareNavigationBarInfo() DecorView@ae2f610[dashboard]
getNavigationBarColor() -855310
E/libprocessgroup: set_timerslack_ns write failed: Operation not permitted
D/InputTransport: Input channel destroyed: 'ClientS', fd=245
I/Ads: Ad failed to load : 3
I/Ads: Ad failed to load : 3
D/ViewRootImpl@560a619[dashboard]: stopped(true) old=false
D/OpenGLRenderer: SurfaceChanged : 0x7e109da080 -> 0x0
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getLong(Ljava/lang/Object;J)J (greylist,core-platform-api, linking, allowed)
W/m.mydrive: Accessing hidden method Lsun/misc/Unsafe;->getObject(Ljava/lang/Object;J)Ljava/lang/Object; (greylist, linking, allowed)
答案 3 :(得分:1)
您需要创建一个串联的输入文本。这可以通过join
函数来完成。
fields = ['Crime type']
text2 = pd.read_csv('allCrime.csv', usecols=fields)
text3 = ' '.join(text2['Crime Type'])
wordcloud2 = WordCloud().generate(text3)
# Generate plot
plt.imshow(wordcloud2)
plt.axis("off")
plt.show()
答案 4 :(得分:0)
使用以下方法可以轻松完成:
df = pd.read_csv('allCrime.csv')
data = df['Crime type'].value_counts().to_dict()
wc = WordCloud().generate_from_frequencies(data)
plt.imshow(wc)
plt.axis('off')
plt.show()