我有一个运行的查询,它输出一个由日期字符串和计数组成的数据列表:
date_cnts = [(u'2014-06-27', 1),
(u'2014-06-29', 3),
(u'2014-06-30', 1),
(u'2014-07-01', 1),
(u'2014-07-02', 1),
(u'2014-07-09', 1),
(u'2014-07-10', 3),
(u'2014-07-11', 1),
(u'2014-07-12', 2),
(u'2014-07-14', 1),
(u'2014-07-15', 2),
(u'2014-07-17', 3),
(u'2014-07-18', 1),
(u'2014-07-20', 1),
(u'2014-07-21', 1),
(u'2014-07-23', 2),
(u'2014-07-26', 2),
(u'2014-07-27', 2),
(u'2014-07-28', 7),
(u'2014-07-29', 3),
(u'2014-07-31', 2),
(u'2014-08-01', 1),
(u'2014-08-05', 4),
(u'2014-08-07', 2),
(u'2014-08-08', 1),
(u'2014-08-13', 1),
(u'2014-08-14', 3),
(u'2014-08-15', 1),
(u'2014-08-16', 6),
(u'2014-08-17', 1),
(u'2014-08-18', 1),
(u'2014-08-20', 1),
(u'2014-08-24', 1),
(u'2014-08-25', 3),
(u'2014-08-29', 1),
(u'2014-08-30', 1),
(u'2014-09-03', 3),
(u'2014-09-13', 1),
(u'2014-09-14', 1),
(u'2014-09-24', 3),
(u'2014-10-20', 1),
(u'2014-10-24', 1),
(u'2014-11-05', 3),
(u'2014-11-09', 1),
(u'2014-11-12', 1),
(u'2014-11-13', 1),
(u'2014-11-14', 1),
(u'2014-11-18', 1),
(u'2014-11-19', 4),
(u'2014-11-22', 1),
(u'2014-11-26', 3),
(u'2014-11-28', 3),
(u'2014-12-01', 2),
(u'2014-12-02', 2),
(u'2014-12-04', 2),
(u'2014-12-05', 1),
(u'2014-12-06', 5),
(u'2014-12-11', 1),
(u'2014-12-15', 10)]
请注意,此数据集中存在日期空白,表示缺少日期的值为0
。
我的工作(非熊猫)版本的代码如下所示:
from matplotlib import pyplot as plt
x_val = [datetime.strptime(x[0],'%Y-%m-%d') for x in date_cnts]
y_val = [x[1] for x in date_cnts]
plt.bar(x_val, y_val)
plt.grid(True)
plt.show()
输出此图像:
现在,如果我将查询结果转换为Panda的数据框
Date Count
0 2014-06-27 1
1 2014-06-29 3
2 2014-06-30 1
3 2014-07-01 1
4 2014-07-02 1
5 2014-07-09 1
6 2014-07-10 3
7 2014-07-11 1
8 2014-07-12 2
9 2014-07-14 1
10 2014-07-15 2
11 2014-07-17 3
12 2014-07-18 1
13 2014-07-20 1
14 2014-07-21 1
15 2014-07-23 2
16 2014-07-26 2
17 2014-07-27 2
18 2014-07-28 7
19 2014-07-29 3
20 2014-07-31 2
21 2014-08-01 1
22 2014-08-05 4
23 2014-08-07 2
24 2014-08-08 1
25 2014-08-13 1
26 2014-08-14 3
27 2014-08-15 1
28 2014-08-16 6
29 2014-08-17 1
30 2014-08-18 1
31 2014-08-20 1
32 2014-08-24 1
33 2014-08-25 3
34 2014-08-29 1
35 2014-08-30 1
36 2014-09-03 3
37 2014-09-13 1
38 2014-09-14 1
39 2014-09-24 3
40 2014-10-20 1
41 2014-10-24 1
42 2014-11-05 3
43 2014-11-09 1
44 2014-11-12 1
45 2014-11-13 1
46 2014-11-14 1
47 2014-11-18 1
48 2014-11-19 4
49 2014-11-22 1
50 2014-11-26 3
51 2014-11-28 3
52 2014-12-01 2
53 2014-12-02 2
54 2014-12-04 2
55 2014-12-05 1
56 2014-12-06 5
57 2014-12-11 1
58 2014-12-15 10
利用简单的Panda包装器来绘制这个:
plt.figure()
df.plot(kind='bar', grid=True, legend=False, x='Date', y=u'Count')
plt.show()
我得到了这个结果。请注意,我的遗失日期未显示在此图表中。
如何读取DataFrame中不存在日期的差距(以及0
值)?
我想利用熊猫的原因是利用它的一些其他功能(最重要的是滚动平均值)。
答案 0 :(得分:1)
我写了一个工作版本,可能不是最好的,但它会完成这项工作。它基于将原始数据重新索引到带有日常样本的DataFrame中。
import pandas as pd
import matplotlib.pyplot as plt
#%% make data
df = pd.DataFrame(date_cnts)
df.columns = ['Date', 'Count']
#%% make dataframe with everyday sampling
df.index = pd.to_datetime(df['Date'])
startdate = df.index[0]
enddate = df.index[-1]
df_new = df.reindex(pd.date_range(startdate, enddate, freq='1D'))
#%% plot the results
df_new['Count'].plot(kind='bar')
# decrease number of days
new_xticks = plt.xticks()[0][1:-1:10]
plt.xticks(new_xticks)
要进一步格式化xticks,我建议您使用此问题:Pandas timeseries plot setting x-axis major and minor ticks and labels