如何根据连续数据的存在来绘制线条

时间:2017-05-03 16:03:35

标签: python pandas matplotlib plot

我有一个如下所示的数据集:

+------------+--------+
| trend_name |  date  |
+------------+--------+
| dogs       | 5/3/17 |
| cats       | 5/3/17 |
| owls       | 5/3/17 |
| dogs       | 5/4/17 |
| cats       | 5/4/17 |
| tigers     | 5/4/17 |
| cats       | 5/5/17 |
| bears      | 5/5/17 |
| giraffes   | 5/5/17 |
+------------+--------+

我想创建一个在y轴上trend_name和在x轴上date的图表,其中的线条连接的趋势将持续> 1个周期和同一平面趋势和趋势点只存在于一个时期内,如果在特定时期内不存在趋势,则没有任何内容。

情节看起来像这样: enter image description here

我只是尝试t.plot(x='date', y='trend_name'),但当然没有数据,所以它引发了错误。

这种类型的情节是否有特定的名称,以便我可以找到更好的资源,或者是否有人有关于如何实现这一目标的建议?

更新:

t是这样的pandas数据帧,但遵循与上面模拟数据帧类似的模式:

enter image description here

t.plot(x='datetime_collected', y='name')收益:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-95-d2a37de17ec0> in <module>()
----> 1 t.plot(x='datetime_collected', y='name')

/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.pyc in __call__(self, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)
   3772                           fontsize=fontsize, colormap=colormap, table=table,
   3773                           yerr=yerr, xerr=xerr, secondary_y=secondary_y,
-> 3774                           sort_columns=sort_columns, **kwds)
   3775     __call__.__doc__ = plot_frame.__doc__
   3776 

/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.pyc in plot_frame(data, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)
   2641                  yerr=yerr, xerr=xerr,
   2642                  secondary_y=secondary_y, sort_columns=sort_columns,
-> 2643                  **kwds)
   2644 
   2645 

/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.pyc in _plot(data, x, y, subplots, ax, kind, **kwds)
   2468         plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
   2469 
-> 2470     plot_obj.generate()
   2471     plot_obj.draw()
   2472     return plot_obj.result

/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.pyc in generate(self)
   1039     def generate(self):
   1040         self._args_adjust()
-> 1041         self._compute_plot_data()
   1042         self._setup_subplots()
   1043         self._make_plot()

/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.pyc in _compute_plot_data(self)
   1148         if is_empty:
   1149             raise TypeError('Empty {0!r}: no numeric data to '
-> 1150                             'plot'.format(numeric_data.__class__.__name__))
   1151 
   1152         self.data = numeric_data

TypeError: Empty 'DataFrame': no numeric data to plot

1 个答案:

答案 0 :(得分:3)

这可能远非最优雅的解决方案,特别是因为我对熊猫不是很熟悉。但无论如何,这是一个为您的绘图限制创建辅助数据帧的解决方案(如果您想忽略未在当前时间窗口中显示的数据点,这是不可避免的):

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

# dummy data
dat = pd.DataFrame({'beast': ['dog','cat','owl','dog','cat','tiger','cat','bear','giraffe','unicorn'],
                    'collected': pd.to_datetime(['2016-03-09']*3 + ['2016-04-05']*3 + ['2016-05-05']*3 + ['2016-06-06'])})

# plotting date interval
t1,t2 = (pd.to_datetime(t) for t in ('2016-03-09','2016-05-05'))

# create auxiliary dataframe for plotting
dat_tmp = dat[(t1<=dat.collected) & (dat.collected<=t2)] # filtered between t1 and t2
beast_id,beasts = zip(*enumerate(dat_tmp.beast.unique()))

# indexing step: see http://stackoverflow.com/a/22346955
dat_tmp = dat_tmp.merge(pd.DataFrame({'beast': beasts, 'beast_id': beast_id}),on='beast',how='left')
dat_tmp = dat_tmp.pivot(index='collected',columns='beast',values='beast_id')

# plot
dat_tmp.plot(style='.-')

def format_fn(tick_val, tick_pos):
    '''uses items in the list `beasts` to set yticklabels'''
    if int(tick_val) in beast_id:
        return beasts[int(tick_val)]
    else:
        return ''

plt.gca().yaxis.set_major_formatter(FuncFormatter(format_fn))
plt.show()

result

正如您所看到的,格式改进仍有很大空间:隐藏不相关的x刻度,缩小一点以完全显示所有点,移动图例等,但这些都是微不足道的整容。

至于我放在一起的虚拟示例(我建议你下次自己也这样做,让其他人更容易解决你的问题),我们从这个数据框开始:

     beast  collected
0      dog 2016-03-09
1      cat 2016-03-09
2      owl 2016-03-09
3      dog 2016-04-05
4      cat 2016-04-05
5    tiger 2016-04-05
6      cat 2016-05-05
7     bear 2016-05-05
8  giraffe 2016-05-05
9  unicorn 2016-06-06

请注意绘图中完全没有的独角兽数据点。在索引/合并步骤之后,我们最终得到了

     beast  collected  beast_id
0      dog 2016-03-09         0
1      cat 2016-03-09         1
2      owl 2016-03-09         2
3      dog 2016-04-05         0
4      cat 2016-04-05         1
5    tiger 2016-04-05         3
6      cat 2016-05-05         1
7     bear 2016-05-05         4
8  giraffe 2016-05-05         5

如您所见,每个点都已使用给定动物的整数索引进行注释。我们需要这个,因为这是我们绘图的y轴所需的数据。旋转后,最终结果是

beast       bear  cat  dog  giraffe  owl  tiger
collected                                      
2016-03-09   NaN  1.0  0.0      NaN  2.0    NaN
2016-04-05   NaN  1.0  0.0      NaN  NaN    3.0
2016-05-05   4.0  1.0  NaN      5.0  NaN    NaN

其列将作为单独的行绘制。可能会有一个较短的行动过程导致相同或相当有用的数据框架,但这就是我所拥有的。好处是数据集中的NaN将自动强制执行“数据连续可用的行”规则。