Pandas:通过在现有列之间进行线性插值来创建新列

时间:2017-08-17 03:38:42

标签: python pandas interpolation

我每周五报告的利润为dataframe,包含5年的数据。

DataFrame看起来像:

更新数据:

  Date      Profit
7/28/2017   2,923 
7/21/2017   2,879 
7/14/2017   2,832 
7/7/2017    2,773 
6/30/2017   2,701 
6/23/2017   2,635 
6/16/2017   2,563 
6/9/2017    2,481 
6/2/2017    2,394 
:
7/29/2016   2,824 
7/22/2016   2,770 
7/15/2016   2,718 
7/8/2016    2,657 
7/1/2016    2,580 
6/24/2016   2,503 
6/17/2016   2,425 
6/10/2016   2,337 
6/3/2016    2,250 
:
7/31/2015   2,848 
7/24/2015   2,796 
7/17/2015   2,748 
7/10/2015   2,694 
7/3/2015    2,623 
6/26/2015   2,548 
6/19/2015   2,474 
6/12/2015   2,387 
6/5/2015    2,301 
:

我希望interpolate在前一年的同一天获利,无论该日期的工作日如何,并填写以下第3和第4列。

    week_ending    profit   Profit one_yr_ago           Profilt 2_yrs_ago
0    2017-07-28    3,010    (Profit on 2016-07-28)      (profit on 2015-07-28)
1    2017-07-21    2,990    (Profit on 2016-07-21)      (profit on 2015-07-21)     
2    2017-07-14    2,973           --                        --
3    2017-07-07    2,945           --                        --
4    2017-06-30    2,888
5    2017-06-23    2,816
6    2017-06-16    2,770
7    2017-06-09    2,709
8    2017-06-02    2,631
9    2017-05-26    2,525 

我在前一年尝试过使用np.interpx[0] - pd.DateOffset(years=1),但不知道如何完成这项工作。我应该使用rolling_apply来有效地进行吗?

修改 @JohanL和@Cheryl:数据每周五都会被捕获。例如:我们知道2017-07-28是星期五,但2016-07-28不是星期五。在新专栏中,我想插入去年和两年前同一天的利润。我已经更新了实际的dataFrame,以表示6月和7月的3年数据。

1 个答案:

答案 0 :(得分:0)

您可以尝试这样的事情:

import pandas as pd
import numpy as np

# function to return last n year's date
def previous_n_year(x, n):
    return x - pd.DateOffset(years=n)

# some sample data
dt_str = ['2017-07-28', '2017-07-21', '2017-07-14', '2017-07-07', '2016-07-28', '2016-07-21', '2016-07-14', '2015-07-28']
dates = [pd.Timestamp(x) for x in dt_str]
profit = [3010, 2990, 2973, 2945, 2888, 2816, 2770, 2709]
df = pd.DataFrame({'profit':profit, 'week_ending': dates})
df = df.set_index(['week_ending'])
df['profit_last_year'] = np.nan  # empty column to store profit of one year ago
df['profit_2_years_ago'] = np.nan  # empty column to store profit of two years ago

# check if there is a profit for previous n years and then fill that in the new column
for idx in df.index:
    if previous_n_year(idx, 1) in df.index:
        df.loc[idx, 'profit_last_year'] = df.loc[previous_n_year(idx,1), 'profit']
    if previous_n_year(idx, 2) in df.index:
        df.loc[idx, 'profit_2_years_ago'] = df.loc[previous_n_year(idx,2), 'profit']

结果:

             profit  profit_last_year  profit_2_years_ago
week_ending                                              
2017-07-28     3010            2888.0              2709.0
2017-07-21     2990            2816.0                 NaN
2017-07-14     2973            2770.0                 NaN
2017-07-07     2945               NaN                 NaN
2016-07-28     2888            2709.0                 NaN
2016-07-21     2816               NaN                 NaN
2016-07-14     2770               NaN                 NaN
2015-07-28     2709               NaN                 NaN

修改

我现在明白上一年的日期是不同的,所以你必须进行插值。我编辑了以下内容:
以日期为索引的数据框:

             profit
2017-07-28     3010
2017-07-21     2990
2017-07-14     2973
2017-07-07     2945
2016-07-28     2888
2016-07-21     2816
2016-07-14     2770
2015-07-28     2709

我们得到一年前的日期和总数:

dates_min1 = [previous_n_year(pd.Timestamp(x),1) for x in dt_str]
dates_total = dates + dates_min1

我重新索引数据帧并插入添加的日期:

df1 = df.reindex(dates_total).sort_index(ascending = False).interpolate(method = 'cubic')

我再次创建两个独立的数据帧并重置索引。然后我再次连接它们。

df_profit = df1.loc[dates].reset_index().rename(columns = {'index':'date'})
df_profit_min1 = df1.loc[dates_min1].rename(columns = {'profit':'profit_min1'}).reset_index(drop = True)
result = pd.concat([df_profit, df_profit_min1], axis = 1)

结果:

         date  profit  profit_min1
0  2017-07-28  2923.0  2816.334310
1  2017-07-21  2879.0  2762.637100
2  2017-07-14  2832.0  2709.779097
3  2017-07-07  2773.0  2648.302836
4  2016-07-29  2824.0  2833.154115
5  2016-07-22  2770.0  2782.006404
6  2016-07-15  2718.0  2733.852340
7  2016-07-08  2657.0          NaN
8  2015-07-31  2848.0          NaN
9  2015-07-24  2796.0          NaN
10 2015-07-17  2748.0          NaN
11 2015-07-10  2694.0          NaN