我每周五报告的利润为dataframe
,包含5年的数据。
DataFrame看起来像:
更新数据:
Date Profit
7/28/2017 2,923
7/21/2017 2,879
7/14/2017 2,832
7/7/2017 2,773
6/30/2017 2,701
6/23/2017 2,635
6/16/2017 2,563
6/9/2017 2,481
6/2/2017 2,394
:
7/29/2016 2,824
7/22/2016 2,770
7/15/2016 2,718
7/8/2016 2,657
7/1/2016 2,580
6/24/2016 2,503
6/17/2016 2,425
6/10/2016 2,337
6/3/2016 2,250
:
7/31/2015 2,848
7/24/2015 2,796
7/17/2015 2,748
7/10/2015 2,694
7/3/2015 2,623
6/26/2015 2,548
6/19/2015 2,474
6/12/2015 2,387
6/5/2015 2,301
:
我希望interpolate
在前一年的同一天获利,无论该日期的工作日如何,并填写以下第3和第4列。
week_ending profit Profit one_yr_ago Profilt 2_yrs_ago
0 2017-07-28 3,010 (Profit on 2016-07-28) (profit on 2015-07-28)
1 2017-07-21 2,990 (Profit on 2016-07-21) (profit on 2015-07-21)
2 2017-07-14 2,973 -- --
3 2017-07-07 2,945 -- --
4 2017-06-30 2,888
5 2017-06-23 2,816
6 2017-06-16 2,770
7 2017-06-09 2,709
8 2017-06-02 2,631
9 2017-05-26 2,525
我在前一年尝试过使用np.interp
和x[0] - pd.DateOffset(years=1)
,但不知道如何完成这项工作。我应该使用rolling_apply来有效地进行吗?
修改 @JohanL和@Cheryl:数据每周五都会被捕获。例如:我们知道2017-07-28是星期五,但2016-07-28不是星期五。在新专栏中,我想插入去年和两年前同一天的利润。我已经更新了实际的dataFrame,以表示6月和7月的3年数据。
答案 0 :(得分:0)
您可以尝试这样的事情:
import pandas as pd
import numpy as np
# function to return last n year's date
def previous_n_year(x, n):
return x - pd.DateOffset(years=n)
# some sample data
dt_str = ['2017-07-28', '2017-07-21', '2017-07-14', '2017-07-07', '2016-07-28', '2016-07-21', '2016-07-14', '2015-07-28']
dates = [pd.Timestamp(x) for x in dt_str]
profit = [3010, 2990, 2973, 2945, 2888, 2816, 2770, 2709]
df = pd.DataFrame({'profit':profit, 'week_ending': dates})
df = df.set_index(['week_ending'])
df['profit_last_year'] = np.nan # empty column to store profit of one year ago
df['profit_2_years_ago'] = np.nan # empty column to store profit of two years ago
# check if there is a profit for previous n years and then fill that in the new column
for idx in df.index:
if previous_n_year(idx, 1) in df.index:
df.loc[idx, 'profit_last_year'] = df.loc[previous_n_year(idx,1), 'profit']
if previous_n_year(idx, 2) in df.index:
df.loc[idx, 'profit_2_years_ago'] = df.loc[previous_n_year(idx,2), 'profit']
结果:
profit profit_last_year profit_2_years_ago
week_ending
2017-07-28 3010 2888.0 2709.0
2017-07-21 2990 2816.0 NaN
2017-07-14 2973 2770.0 NaN
2017-07-07 2945 NaN NaN
2016-07-28 2888 2709.0 NaN
2016-07-21 2816 NaN NaN
2016-07-14 2770 NaN NaN
2015-07-28 2709 NaN NaN
修改
我现在明白上一年的日期是不同的,所以你必须进行插值。我编辑了以下内容:
以日期为索引的数据框:
profit
2017-07-28 3010
2017-07-21 2990
2017-07-14 2973
2017-07-07 2945
2016-07-28 2888
2016-07-21 2816
2016-07-14 2770
2015-07-28 2709
我们得到一年前的日期和总数:
dates_min1 = [previous_n_year(pd.Timestamp(x),1) for x in dt_str]
dates_total = dates + dates_min1
我重新索引数据帧并插入添加的日期:
df1 = df.reindex(dates_total).sort_index(ascending = False).interpolate(method = 'cubic')
我再次创建两个独立的数据帧并重置索引。然后我再次连接它们。
df_profit = df1.loc[dates].reset_index().rename(columns = {'index':'date'})
df_profit_min1 = df1.loc[dates_min1].rename(columns = {'profit':'profit_min1'}).reset_index(drop = True)
result = pd.concat([df_profit, df_profit_min1], axis = 1)
结果:
date profit profit_min1
0 2017-07-28 2923.0 2816.334310
1 2017-07-21 2879.0 2762.637100
2 2017-07-14 2832.0 2709.779097
3 2017-07-07 2773.0 2648.302836
4 2016-07-29 2824.0 2833.154115
5 2016-07-22 2770.0 2782.006404
6 2016-07-15 2718.0 2733.852340
7 2016-07-08 2657.0 NaN
8 2015-07-31 2848.0 NaN
9 2015-07-24 2796.0 NaN
10 2015-07-17 2748.0 NaN
11 2015-07-10 2694.0 NaN