前5种语言的图表

Question

我是编码的新手，我还在学习。话虽如此，我一直在关注如何从twitter API进行数据分析的教程：http://adilmoujahid.com/posts/2014/07/twitter-analytics/

我相信他使用的是python 2.7而我使用的是python 3.6.1所以我已经将代码转换为我正在使用的python版本，到目前为止它一直有效，直到我进入前5个国家图表。具体来说，当我尝试运行两天前只工作过一次的前5个国家的代码时，现在我只收到以下错误消息：

<div class="box">
  <h2>HUB</h2>
  <p>test</p>
<button class="scopri"> more </button>
</div>

<div class="modalita">
  <div class="modalita_box">
    <p> try </p>
  </div>
</div>

是否有其他人遇到此问题和/或什么是最佳解决方案？我无法弄清楚如何解决这个问题。谢谢！

整个代码（迄今为止）

"---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-47-601663476327> in <module>()
          7 ax.set_ylabel('Number of tweets' , fontsize=15)
          8 ax.set_title('Top 5 countries', fontsize=15, fontweight='bold')
    ----> 9 tweets_by_country[:5].plot(ax=ax, kind='bar', color='blue')
          10 plt.show()
~/Environments/Environments/my_env/lib/python3.6/site-    packages/pandas/plotting/_core.py in __call__(self, kind, ax, figsize, use_index,   title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
   2441                            colormap=colormap, table=table, yerr=yerr,
   2442                            xerr=xerr, label=label,  secondary_y=secondary_y,
-> 2443                            **kwds)
   2444     __call__.__doc__ = plot_series.__doc__
   2445 

~/Environments/Environments/my_env/lib/python3.6/site-packages/pandas/plotting/_core.py in plot_series(data, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
   1882                  yerr=yerr, xerr=xerr,
   1883                  label=label, secondary_y=secondary_y,
-> 1884                  **kwds)
   1885 
   1886 

~/Environments/Environments/my_env/lib/python3.6/site-packages/pandas/plotting/_core.py in _plot(data, x, y, subplots, ax, kind, **kwds)
   1682         plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
   1683 
-> 1684     plot_obj.generate()
   1685     plot_obj.draw()
   1686     return plot_obj.result

~/Environments/Environments/my_env/lib/python3.6/site-packages/pandas/plotting/_core.py in generate(self)
    236     def generate(self):
    237         self._args_adjust()
--> 238         self._compute_plot_data()
    239         self._setup_subplots()
    240         self._make_plot()

~/Environments/Environments/my_env/lib/python3.6/site-packages/pandas/plotting/_core.py in _compute_plot_data(self)
    345         if is_empty:
    346             raise TypeError('Empty {0!r}: no numeric data to '
--> 347                             'plot'.format(numeric_data.__class__.__name__))
    348 
    349         self.data = numeric_data

    TypeError: Empty 'DataFrame': no numeric data to plot"

前5种语言的图表

import json
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt

tweets_data_path = '...twitter_data.txt'

tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
    try:
        tweet = json.loads(line)
        tweets_data.append(tweet)
    except:
        continue

print (len (tweets_data))

tweets = pd.DataFrame()

tweets['text'] = list(map(lambda tweet: tweet['text'], tweets_data))
tweets['lang'] = list(map(lambda tweet: tweet['lang'], tweets_data))
tweets['country'] = list(map(lambda tweet: tweet['place']['country'] if     tweet['place'] != None else None, tweets_data))

前5个国家/地区的图表

tweets_by_lang = tweets['lang'].value_counts()

fig, ax = plt.subplots()
ax.tick_params(axis='x', labelsize=15)
ax.tick_params(axis='y', labelsize=10)
ax.set_xlabel('Languages', fontsize=15)
ax.set_ylabel('Number of tweets' , fontsize=15)
ax.set_title('Top 5 languages', fontsize=15, fontweight='bold')
tweets_by_lang[:5].plot(ax=ax, kind='bar', color='red')
plt.show()

Answer 1

您的数据实际上是数字的吗？您可以使用例如

进行检查

print(type(tweets['country'][0]))

鉴于您使用的是json.loads（从字符串反序列化），它很可能不是数字的，这就是错误所指的含义。尝试将数据类型转换为浮点型（或其他类型）：

tweets = tweets.astype('float')

，看看是否可以解决问题。如果需要，您也可以将此功能仅应用于特定的列。祝你好运！

Answer 2

我认为您的文件不存在或存在路径问题。前两个步骤http://adilmoujahid.com/posts/2014/07/twitter-analytics/检索文件并将其保存在本地。该文件是否存在于指定的路径中？

    tweets_data_path = '...twitter_data.txt'

以下内容会返回什么？

    print (len (tweets_data))

TypeError：清空＆＃39; DataFrame＆＃39;：没有要绘制的数字数据

前5种语言的图表

前5个国家/地区的图表

2 个答案: