我有一堆带有一堆日期的数据框
0 2016-01-01
1 2016-01-02
2 2016-01-03
3 2016-01-04
4 2016-01-05
5 2016-01-06
6 2016-01-07
7 2016-01-08
8 2016-01-09
9 2016-01-10
10 2016-01-11
11 2016-01-12
12 2016-01-13
13 2016-01-14
14 2016-01-15
15 2016-01-16
16 2016-01-17
17 2016-01-20
18 2016-01-21
19 2016-01-22
20 2016-01-24
21 2016-01-25
22 2016-01-27
23 2016-01-28
24 2016-01-29
25 2016-01-30
26 2016-01-31
我想使用r = df.group_by('time')
按日期对数据框进行分组,然后循环键以获取一些统计信息。事情是,日子还没有完成(你会发现我错过了1月18日和19日)。所以我真正想做的是创建一个日期范围,然后遍历日期范围。但是当我尝试这个时,当我将日期范围的元素传递到字典中时,我收到了一个关键错误。
关于如何做到这一点的任何想法?
以下是一些代码:
doi = (df.time<='2016-01-31')&(df.time>='2016-01-01')
oil = df[doi]
#Trouble Here.
r = oil.groupby(by = 'time')
D = oil.time
dates = pd.date_range(D.min(),D.max())
frames = []
for d in dates:
#The idea here is that if the date in the date range is not in the dataframe,
#Then there is no sum to compute. return 0
try:
sum_of_oil = oil.ix[r.groups[d]].capacity.sum()
except KeyError:
sum_of_oil = 0
frames.append([d,sum_of_oil])
frames = pd.DataFrame(frames, columns = ['time','volume'])
值得注意的是oil.time
的元素是Timestamps
。
答案 0 :(得分:2)
即使是不完整的时间序列,您也可以重新采样。
date qty
0 2015-01-01 123
1 2015-01-02 213
2 2015-01-03 41234
3 2015-01-04 12342
4 2015-01-05 32
5 2015-01-06 3
6 2015-01-07 24
7 2015-01-08 23423
8 2015-01-09 4
9 2015-01-10 234
10 2015-01-12 234
11 2015-01-13 324
12 2015-01-17 123
13 2015-01-18 5
14 2015-01-19 3454
15 2015-01-20 574
16 2015-01-21 51
17 2015-01-22 56
尝试
print df.set_index('date').resample('D').fillna(0).reset_index()
产生,
date qty
0 2015-01-01 123
1 2015-01-02 213
2 2015-01-03 41234
3 2015-01-04 12342
4 2015-01-05 32
5 2015-01-06 3
6 2015-01-07 24
7 2015-01-08 23423
8 2015-01-09 4
9 2015-01-10 234
10 2015-01-11 0
11 2015-01-12 234
12 2015-01-13 324
13 2015-01-14 0
14 2015-01-15 0
15 2015-01-16 0
16 2015-01-17 123
17 2015-01-18 5
18 2015-01-19 3454
19 2015-01-20 574
20 2015-01-21 51
21 2015-01-22 56
答案 1 :(得分:1)
考虑合并一整套完整的月份日:
import datetime
import pandas as pd
startdate = datetime.datetime.strptime('2015-01-01', '%Y-%m-%d')
jandates = [startdate + datetime.timedelta(days=i) for i in range(31)]
datesdf = pd.DataFrame({'date':jandates})
mergedf = pd.merge(datesdf, actualdf, on='date', how='left').fillna(0)