访问带有pandas日期范围的字典

时间:2016-03-10 23:46:06

标签: python pandas

我有一堆带有一堆日期的数据框

0   2016-01-01
1   2016-01-02
2   2016-01-03
3   2016-01-04
4   2016-01-05
5   2016-01-06
6   2016-01-07
7   2016-01-08
8   2016-01-09
9   2016-01-10
10  2016-01-11
11  2016-01-12
12  2016-01-13
13  2016-01-14
14  2016-01-15
15  2016-01-16
16  2016-01-17
17  2016-01-20
18  2016-01-21
19  2016-01-22
20  2016-01-24
21  2016-01-25
22  2016-01-27
23  2016-01-28
24  2016-01-29
25  2016-01-30
26  2016-01-31

我想使用r = df.group_by('time')按日期对数据框进行分组,然后循环键以获取一些统计信息。事情是,日子还没有完成(你会发现我错过了1月18日和19日)。所以我真正想做的是创建一个日期范围,然后遍历日期范围。但是当我尝试这个时,当我将日期范围的元素传递到字典中时,我收到了一个关键错误。

关于如何做到这一点的任何想法?

以下是一些代码:

doi = (df.time<='2016-01-31')&(df.time>='2016-01-01')
oil = df[doi]


#Trouble Here.
r = oil.groupby(by = 'time')
D = oil.time
dates = pd.date_range(D.min(),D.max())
frames = []

for d in dates:
#The idea here is that if the date in the date range is not in the dataframe,
#Then there is no sum to compute.  return 0
    try:
        sum_of_oil = oil.ix[r.groups[d]].capacity.sum()
    except KeyError:
        sum_of_oil = 0
    frames.append([d,sum_of_oil])

frames = pd.DataFrame(frames, columns = ['time','volume'])

值得注意的是oil.time的元素是Timestamps

2 个答案:

答案 0 :(得分:2)

即使是不完整的时间序列,您也可以重新采样。

         date    qty
0  2015-01-01    123
1  2015-01-02    213
2  2015-01-03  41234
3  2015-01-04  12342
4  2015-01-05     32
5  2015-01-06      3
6  2015-01-07     24
7  2015-01-08  23423
8  2015-01-09      4
9  2015-01-10    234
10 2015-01-12    234
11 2015-01-13    324
12 2015-01-17    123
13 2015-01-18      5
14 2015-01-19   3454
15 2015-01-20    574
16 2015-01-21     51
17 2015-01-22     56

尝试

print df.set_index('date').resample('D').fillna(0).reset_index()

产生,

         date    qty
0  2015-01-01    123
1  2015-01-02    213
2  2015-01-03  41234
3  2015-01-04  12342
4  2015-01-05     32
5  2015-01-06      3
6  2015-01-07     24
7  2015-01-08  23423
8  2015-01-09      4
9  2015-01-10    234
10 2015-01-11      0
11 2015-01-12    234
12 2015-01-13    324
13 2015-01-14      0
14 2015-01-15      0
15 2015-01-16      0
16 2015-01-17    123
17 2015-01-18      5
18 2015-01-19   3454
19 2015-01-20    574
20 2015-01-21     51
21 2015-01-22     56

答案 1 :(得分:1)

考虑合并一整套完整的月份日:

import datetime
import pandas as pd

startdate = datetime.datetime.strptime('2015-01-01', '%Y-%m-%d')
jandates = [startdate + datetime.timedelta(days=i) for i in range(31)]

datesdf = pd.DataFrame({'date':jandates})    
mergedf = pd.merge(datesdf, actualdf, on='date', how='left').fillna(0)