I am trying to roll up daily data into fiscal quarter data. For example, I have a table with fiscal quarter end dates:
Company Period Quarter_End
M 2016Q1 05/02/2015
M 2016Q2 08/01/2015
M 2016Q3 10/31/2015
M 2016Q4 01/30/2016
WFM 2015Q2 04/12/2015
WFM 2015Q3 07/05/2015
WFM 2015Q4 09/27/2015
WFM 2016Q1 01/17/2016
and a table of daily data:
Company Date Price
M 06/20/2015 1.05
M 06/22/2015 4.05
M 07/10/2015 3.45
M 07/29/2015 1.86
M 08/24/2015 1.58
M 09/02/2015 8.64
M 09/22/2015 2.56
M 10/20/2015 5.42
M 11/02/2015 1.58
M 11/24/2015 4.58
M 12/03/2015 6.48
M 12/05/2015 4.56
M 01/03/2016 7.14
M 01/30/2016 6.34
WFM 06/20/2015 1.05
WFM 06/22/2015 4.05
WFM 07/10/2015 3.45
WFM 07/29/2015 1.86
WFM 08/24/2015 1.58
WFM 09/02/2015 8.64
WFM 09/22/2015 2.56
WFM 10/20/2015 5.42
WFM 11/02/2015 1.58
WFM 11/24/2015 4.58
WFM 12/03/2015 6.48
WFM 12/05/2015 4.56
WFM 01/03/2016 7.14
WFM 01/17/2016 6.34
And I would like to create the table below.
Company Period Quarter_end Sum(Price)
M 2016Q2 8/1/2015 10.41
M 2016Q3 10/31/2015 18.2
M 2016Q4 1/30/2016 30.68
WFM 2015Q3 7/5/2015 5.1
WFM 2015Q4 9/27/2015 18.09
WFM 2016Q1 1/17/2016 36.1
However, I don't know how to group by varying dates without looping through each record. Any help is greatly appreciated.
Thanks!
答案 0 :(得分:6)
我认为您可以使用merge_ordered
:
#first convert columns to datetime
df1.Quarter_End = pd.to_datetime(df1.Quarter_End)
df2.Date = pd.to_datetime(df2.Date)
df = pd.merge_ordered(df1,
df2,
left_on=['Company','Quarter_End'],
right_on=['Company','Date'],
how='outer')
print (df)
Company Period Quarter_End Date Price
0 M 2016Q1 2015-05-02 NaT NaN
1 M NaN NaT 2015-06-20 1.05
2 M NaN NaT 2015-06-22 4.05
3 M NaN NaT 2015-07-10 3.45
4 M NaN NaT 2015-07-29 1.86
5 M 2016Q2 2015-08-01 NaT NaN
6 M NaN NaT 2015-08-24 1.58
7 M NaN NaT 2015-09-02 8.64
8 M NaN NaT 2015-09-22 2.56
9 M NaN NaT 2015-10-20 5.42
10 M 2016Q3 2015-10-31 NaT NaN
11 M NaN NaT 2015-11-02 1.58
12 M NaN NaT 2015-11-24 4.58
13 M NaN NaT 2015-12-03 6.48
14 M NaN NaT 2015-12-05 4.56
15 M NaN NaT 2016-01-03 7.14
16 M 2016Q4 2016-01-30 2016-01-30 6.34
17 WFM 2015Q2 2015-04-12 NaT NaN
18 WFM NaN NaT 2015-06-20 1.05
19 WFM NaN NaT 2015-06-22 4.05
20 WFM 2015Q3 2015-07-05 NaT NaN
21 WFM NaN NaT 2015-07-10 3.45
22 WFM NaN NaT 2015-07-29 1.86
23 WFM NaN NaT 2015-08-24 1.58
24 WFM NaN NaT 2015-09-02 8.64
25 WFM NaN NaT 2015-09-22 2.56
26 WFM 2015Q4 2015-09-27 NaT NaN
27 WFM NaN NaT 2015-10-20 5.42
28 WFM NaN NaT 2015-11-02 1.58
29 WFM NaN NaT 2015-11-24 4.58
30 WFM NaN NaT 2015-12-03 6.48
31 WFM NaN NaT 2015-12-05 4.56
32 WFM NaN NaT 2016-01-03 7.14
33 WFM 2016Q1 2016-01-17 2016-01-17 6.34
然后在bfill
列中填写NaN
,在sum
填充Period
并汇总Series.dropna
。如果需要删除所有NaN值,请添加reset_index
和最后LUA_PATH
:
Quarter_End
答案 1 :(得分:4)
set_index
pd.concat
以对齐索引groupby
与agg
prd_df = period_df.set_index(['Company', 'Quarter_End'])
prc_df = price_df.set_index(['Company', 'Date'], drop=False)
df = pd.concat([prd_df, prc_df], axis=1)
df.groupby([df.index.get_level_values(0), df.Period.bfill()]) \
.agg(dict(Date='last', Price='sum')).dropna()