我有一个如下所示的数据集:
+------------+--------+
| trend_name | date |
+------------+--------+
| dogs | 5/3/17 |
| cats | 5/3/17 |
| owls | 5/3/17 |
| dogs | 5/4/17 |
| cats | 5/4/17 |
| tigers | 5/4/17 |
| cats | 5/5/17 |
| bears | 5/5/17 |
| giraffes | 5/5/17 |
+------------+--------+
我想创建一个在y轴上trend_name
和在x轴上date
的图表,其中的线条连接的趋势将持续> 1个周期和同一平面趋势和趋势点只存在于一个时期内,如果在特定时期内不存在趋势,则没有任何内容。
我只是尝试t.plot(x='date', y='trend_name')
,但当然没有数据,所以它引发了错误。
这种类型的情节是否有特定的名称,以便我可以找到更好的资源,或者是否有人有关于如何实现这一目标的建议?
更新:
t是这样的pandas数据帧,但遵循与上面模拟数据帧类似的模式:
t.plot(x='datetime_collected', y='name')
收益:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-95-d2a37de17ec0> in <module>()
----> 1 t.plot(x='datetime_collected', y='name')
/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.pyc in __call__(self, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)
3772 fontsize=fontsize, colormap=colormap, table=table,
3773 yerr=yerr, xerr=xerr, secondary_y=secondary_y,
-> 3774 sort_columns=sort_columns, **kwds)
3775 __call__.__doc__ = plot_frame.__doc__
3776
/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.pyc in plot_frame(data, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, **kwds)
2641 yerr=yerr, xerr=xerr,
2642 secondary_y=secondary_y, sort_columns=sort_columns,
-> 2643 **kwds)
2644
2645
/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.pyc in _plot(data, x, y, subplots, ax, kind, **kwds)
2468 plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
2469
-> 2470 plot_obj.generate()
2471 plot_obj.draw()
2472 return plot_obj.result
/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.pyc in generate(self)
1039 def generate(self):
1040 self._args_adjust()
-> 1041 self._compute_plot_data()
1042 self._setup_subplots()
1043 self._make_plot()
/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.pyc in _compute_plot_data(self)
1148 if is_empty:
1149 raise TypeError('Empty {0!r}: no numeric data to '
-> 1150 'plot'.format(numeric_data.__class__.__name__))
1151
1152 self.data = numeric_data
TypeError: Empty 'DataFrame': no numeric data to plot
答案 0 :(得分:3)
这可能远非最优雅的解决方案,特别是因为我对熊猫不是很熟悉。但无论如何,这是一个为您的绘图限制创建辅助数据帧的解决方案(如果您想忽略未在当前时间窗口中显示的数据点,这是不可避免的):
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
# dummy data
dat = pd.DataFrame({'beast': ['dog','cat','owl','dog','cat','tiger','cat','bear','giraffe','unicorn'],
'collected': pd.to_datetime(['2016-03-09']*3 + ['2016-04-05']*3 + ['2016-05-05']*3 + ['2016-06-06'])})
# plotting date interval
t1,t2 = (pd.to_datetime(t) for t in ('2016-03-09','2016-05-05'))
# create auxiliary dataframe for plotting
dat_tmp = dat[(t1<=dat.collected) & (dat.collected<=t2)] # filtered between t1 and t2
beast_id,beasts = zip(*enumerate(dat_tmp.beast.unique()))
# indexing step: see http://stackoverflow.com/a/22346955
dat_tmp = dat_tmp.merge(pd.DataFrame({'beast': beasts, 'beast_id': beast_id}),on='beast',how='left')
dat_tmp = dat_tmp.pivot(index='collected',columns='beast',values='beast_id')
# plot
dat_tmp.plot(style='.-')
def format_fn(tick_val, tick_pos):
'''uses items in the list `beasts` to set yticklabels'''
if int(tick_val) in beast_id:
return beasts[int(tick_val)]
else:
return ''
plt.gca().yaxis.set_major_formatter(FuncFormatter(format_fn))
plt.show()
正如您所看到的,格式改进仍有很大空间:隐藏不相关的x刻度,缩小一点以完全显示所有点,移动图例等,但这些都是微不足道的整容。
至于我放在一起的虚拟示例(我建议你下次自己也这样做,让其他人更容易解决你的问题),我们从这个数据框开始:
beast collected
0 dog 2016-03-09
1 cat 2016-03-09
2 owl 2016-03-09
3 dog 2016-04-05
4 cat 2016-04-05
5 tiger 2016-04-05
6 cat 2016-05-05
7 bear 2016-05-05
8 giraffe 2016-05-05
9 unicorn 2016-06-06
请注意绘图中完全没有的独角兽数据点。在索引/合并步骤之后,我们最终得到了
beast collected beast_id
0 dog 2016-03-09 0
1 cat 2016-03-09 1
2 owl 2016-03-09 2
3 dog 2016-04-05 0
4 cat 2016-04-05 1
5 tiger 2016-04-05 3
6 cat 2016-05-05 1
7 bear 2016-05-05 4
8 giraffe 2016-05-05 5
如您所见,每个点都已使用给定动物的整数索引进行注释。我们需要这个,因为这是我们绘图的y
轴所需的数据。旋转后,最终结果是
beast bear cat dog giraffe owl tiger
collected
2016-03-09 NaN 1.0 0.0 NaN 2.0 NaN
2016-04-05 NaN 1.0 0.0 NaN NaN 3.0
2016-05-05 4.0 1.0 NaN 5.0 NaN NaN
其列将作为单独的行绘制。可能会有一个较短的行动过程导致相同或相当有用的数据框架,但这就是我所拥有的。好处是数据集中的NaN
将自动强制执行“数据连续可用的行”规则。