在Python 3和熊猫中,我有一个带有单词列表列的数据框“ proposicoes”。该列名为“ ementa_token”
我想从“ ementa_token”列中总结出来。每行都有一个单词列表:
proposicoes[proposicoes['id'] == '465465']['ementa_token'].iloc[0]
['Comunica',
'Excelentíssimo',
'Senhor',
'Presidente',
'República',
'sanção',
'projeto',
'lei',
'Institui',
'Fundo',
'Nacional',
'Idoso',
'autoriza',
'deduzir',
'imposto',
'renda',
'devido',
'pessoas',
'físicas',
'jurídicas',
'doações',
'efetuadas',
'Fundos',
'Municipais',
'Estaduais',
'Nacional',
'Idoso',
'altera',
'Lei',
'nº',
'9250',
'26',
'dezembro',
'1995',
'restitui',
'arquivo',
'Congresso',
'Nacional',
'dois',
'autógrafos',
'texto',
'ora',
'convertido',
'Lei',
'nº',
'12213',
'20',
'janeiro',
'2010']
我尝试过这种方式:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
%matplotlib inline
wordcloud = WordCloud(width=800, height=400).generate(proposicoes['ementa_token'])
plt.figure( figsize=(30,20) )
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
我遇到了这个错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-18-c072e91a9fe7> in <module>
----> 1 wordcloud = WordCloud(width=800, height=400).generate(proposicoes['ementa_token'])
2 plt.figure( figsize=(30,20) )
3 plt.imshow(wordcloud)
4 plt.axis("off")
5 plt.show()
c:\users\reinaldo\documents\code\palavras\lib\site-packages\wordcloud\wordcloud.py in generate(self, text)
603 self
604 """
--> 605 return self.generate_from_text(text)
606
607 def _check_generated(self):
c:\users\reinaldo\documents\code\palavras\lib\site-packages\wordcloud\wordcloud.py in generate_from_text(self, text)
584 self
585 """
--> 586 words = self.process_text(text)
587 self.generate_from_frequencies(words)
588 return self
c:\users\reinaldo\documents\code\palavras\lib\site-packages\wordcloud\wordcloud.py in process_text(self, text)
551 regexp = self.regexp if self.regexp is not None else r"\w[\w']+"
552
--> 553 words = re.findall(regexp, text, flags)
554 # remove stopwords
555 words = [word for word in words if word.lower() not in stopwords]
c:\users\reinaldo\documents\code\palavras\lib\re.py in findall(pattern, string, flags)
221
222 Empty matches are included in the result."""
--> 223 return _compile(pattern, flags).findall(string)
224
225 def finditer(pattern, string, flags=0):
TypeError: expected string or bytes-like object
这是否意味着代码没有读取每一行列表中的单词?拜托,有人知道怎么做吗?
答案 0 :(得分:1)
TypeError很清楚,WordCloud期望字符串不是Series。合并列中的列表,然后加入,
wordcloud = WordCloud(width=800, height=400).generate(' '.join(proposicoes['ementa_token'].sum())
选项2:
data = ' '.join(np.concatenate(df.col2))
wordcloud = WordCloud(width=800, height=400).generate(' '.join(data)