使用Pandas创建绘图并直接显示与Matplotlib类似的输出

时间:2014-12-16 07:17:34

标签: python matplotlib pandas

我有一个运行的查询,它输出一个由日期字符串和计数组成的数据列表:

date_cnts = [(u'2014-06-27', 1),
 (u'2014-06-29', 3),
 (u'2014-06-30', 1),
 (u'2014-07-01', 1),
 (u'2014-07-02', 1),
 (u'2014-07-09', 1),
 (u'2014-07-10', 3),
 (u'2014-07-11', 1),
 (u'2014-07-12', 2),
 (u'2014-07-14', 1),
 (u'2014-07-15', 2),
 (u'2014-07-17', 3),
 (u'2014-07-18', 1),
 (u'2014-07-20', 1),
 (u'2014-07-21', 1),
 (u'2014-07-23', 2),
 (u'2014-07-26', 2),
 (u'2014-07-27', 2),
 (u'2014-07-28', 7),
 (u'2014-07-29', 3),
 (u'2014-07-31', 2),
 (u'2014-08-01', 1),
 (u'2014-08-05', 4),
 (u'2014-08-07', 2),
 (u'2014-08-08', 1),
 (u'2014-08-13', 1),
 (u'2014-08-14', 3),
 (u'2014-08-15', 1),
 (u'2014-08-16', 6),
 (u'2014-08-17', 1),
 (u'2014-08-18', 1),
 (u'2014-08-20', 1),
 (u'2014-08-24', 1),
 (u'2014-08-25', 3),
 (u'2014-08-29', 1),
 (u'2014-08-30', 1),
 (u'2014-09-03', 3),
 (u'2014-09-13', 1),
 (u'2014-09-14', 1),
 (u'2014-09-24', 3),
 (u'2014-10-20', 1),
 (u'2014-10-24', 1),
 (u'2014-11-05', 3),
 (u'2014-11-09', 1),
 (u'2014-11-12', 1),
 (u'2014-11-13', 1),
 (u'2014-11-14', 1),
 (u'2014-11-18', 1),
 (u'2014-11-19', 4),
 (u'2014-11-22', 1),
 (u'2014-11-26', 3),
 (u'2014-11-28', 3),
 (u'2014-12-01', 2),
 (u'2014-12-02', 2),
 (u'2014-12-04', 2),
 (u'2014-12-05', 1),
 (u'2014-12-06', 5),
 (u'2014-12-11', 1),
 (u'2014-12-15', 10)]

请注意,此数据集中存在日期空白,表示缺少日期的值为0

我的工作(非熊猫)版本的代码如下所示:

from matplotlib import pyplot as plt
x_val = [datetime.strptime(x[0],'%Y-%m-%d') for x in date_cnts]
y_val = [x[1] for x in date_cnts]
plt.bar(x_val, y_val)
plt.grid(True)
plt.show()

输出此图像:

Matplotlib output

现在,如果我将查询结果转换为Panda的数据框

          Date  Count
0   2014-06-27      1
1   2014-06-29      3
2   2014-06-30      1
3   2014-07-01      1
4   2014-07-02      1
5   2014-07-09      1
6   2014-07-10      3
7   2014-07-11      1
8   2014-07-12      2
9   2014-07-14      1
10  2014-07-15      2
11  2014-07-17      3
12  2014-07-18      1
13  2014-07-20      1
14  2014-07-21      1
15  2014-07-23      2
16  2014-07-26      2
17  2014-07-27      2
18  2014-07-28      7
19  2014-07-29      3
20  2014-07-31      2
21  2014-08-01      1
22  2014-08-05      4
23  2014-08-07      2
24  2014-08-08      1
25  2014-08-13      1
26  2014-08-14      3
27  2014-08-15      1
28  2014-08-16      6
29  2014-08-17      1
30  2014-08-18      1
31  2014-08-20      1
32  2014-08-24      1
33  2014-08-25      3
34  2014-08-29      1
35  2014-08-30      1
36  2014-09-03      3
37  2014-09-13      1
38  2014-09-14      1
39  2014-09-24      3
40  2014-10-20      1
41  2014-10-24      1
42  2014-11-05      3
43  2014-11-09      1
44  2014-11-12      1
45  2014-11-13      1
46  2014-11-14      1
47  2014-11-18      1
48  2014-11-19      4
49  2014-11-22      1
50  2014-11-26      3
51  2014-11-28      3
52  2014-12-01      2
53  2014-12-02      2
54  2014-12-04      2
55  2014-12-05      1
56  2014-12-06      5
57  2014-12-11      1
58  2014-12-15     10

利用简单的Panda包装器来绘制这个:

plt.figure()
df.plot(kind='bar', grid=True, legend=False, x='Date', y=u'Count')
plt.show()

我得到了这个结果。请注意,我的遗失日期未显示在此图表中。

Pandas output

如何读取DataFrame中不存在日期的差距(以及0值)?

我想利用熊猫的原因是利用它的一些其他功能(最重要的是滚动平均值)。

1 个答案:

答案 0 :(得分:1)

我写了一个工作版本,可能不是最好的,但它会完成这项工作。它基于将原始数据重新索引到带有日常样本的DataFrame中。

import pandas as pd
import matplotlib.pyplot as plt

#%% make data
df =  pd.DataFrame(date_cnts)
df.columns = ['Date', 'Count']

#%% make dataframe with everyday sampling
df.index = pd.to_datetime(df['Date'])
startdate = df.index[0]
enddate = df.index[-1]
df_new = df.reindex(pd.date_range(startdate, enddate, freq='1D'))

#%% plot the results
df_new['Count'].plot(kind='bar')

# decrease number of days 
new_xticks = plt.xticks()[0][1:-1:10]
plt.xticks(new_xticks)

enter image description here

要进一步格式化xticks,我建议您使用此问题:Pandas timeseries plot setting x-axis major and minor ticks and labels