如何根据前一个值和另一列的当前值计算列

时间:2017-05-29 15:16:45

标签: python pandas

我对pandas没有太多经验,我有以下DataFrame:

month       A               B
2/28/2017   0.7377573034    0
3/31/2017   0.7594787565    3.7973937824
4/30/2017   0.7508308808    3.7541544041
5/31/2017   0.7038814004    7.0388140044
6/30/2017   0.6920212254    11.0723396061
7/31/2017   0.6801610503    11.5627378556
8/31/2017   0.6683008753    10.6928140044
9/30/2017   0.7075915026    11.3214640415
10/31/2017  0.6989436269    7.6883798964
11/30/2017  0.6259514607    4.3816602247
12/31/2017  0.6119757303    3.671854382
1/31/2018   0.633           3.798
2/28/2018   0.598           4.784
3/31/2018   0.673           5.384
4/30/2018   0.673           1.346
5/31/2018   0.609           0
6/30/2018   0.609           0
7/31/2018   0.609           0
8/31/2018   0.609           0
9/30/2018   0.673           0
10/31/2018  0.673           0
11/30/2018  0.598           0
12/31/2018  0.598           0

我需要计算列C,它基本上是列AB列,但列B的值是相应的前一年的值月。此外,对于上一年没有相应月份的值,该值应为零。更具体地说,这是我期望的C

C
0 # these values are zero because the corresponding month in the previous year is not in column A
0
0
0
0
0
0
0
0
0
0
0
0               # 0.598 * 0
2.5556460155552 # 0.673 * 3.7973937824
2.5265459139593 # 0.673 * 3.7541544041
4.2866377286796 # 0.609 * 7.0388140044
6.7430548201149 # 0.609 * 11.0723396061
7.0417073540604 # 0.609 * 11.5627378556
6.5119237286796 # 0.609 * 10.6928140044
7.6193452999295 # 0.673 * 11.3214640415
5.1742796702772 # 0.673 * 7.6883798964
2.6202328143706 # 0.598 * 4.3816602247
2.195768920436  # 0.598 * 3.671854382

我怎样才能做到这一点?我确信可能有办法不使用for循环。提前谢谢。

1 个答案:

答案 0 :(得分:1)

In [73]: (df.drop('B',1)
    ...:   .merge(df.drop('A',1)
    ...:            .assign(month=df.month + pd.offsets.MonthEnd(12)),
    ...:          on='month', how='left')
    ...:   .eval("C = A * B", inplace=False)
    ...:   .fillna(0)
    ...: )
    ...:
Out[73]:
        month         A          B         C
0  2017-02-28  0.737757   0.000000  0.000000
1  2017-03-31  0.759479   0.000000  0.000000
2  2017-04-30  0.750831   0.000000  0.000000
3  2017-05-31  0.703881   0.000000  0.000000
4  2017-06-30  0.692021   0.000000  0.000000
5  2017-07-31  0.680161   0.000000  0.000000
6  2017-08-31  0.668301   0.000000  0.000000
7  2017-09-30  0.707592   0.000000  0.000000
8  2017-10-31  0.698944   0.000000  0.000000
9  2017-11-30  0.625951   0.000000  0.000000
10 2017-12-31  0.611976   0.000000  0.000000
11 2018-01-31  0.633000   0.000000  0.000000
12 2018-02-28  0.598000   0.000000  0.000000
13 2018-03-31  0.673000   3.797394  2.555646
14 2018-04-30  0.673000   3.754154  2.526546
15 2018-05-31  0.609000   7.038814  4.286638
16 2018-06-30  0.609000  11.072340  6.743055
17 2018-07-31  0.609000  11.562738  7.041707
18 2018-08-31  0.609000  10.692814  6.511924
19 2018-09-30  0.673000  11.321464  7.619345
20 2018-10-31  0.673000   7.688380  5.174280
21 2018-11-30  0.598000   4.381660  2.620233
22 2018-12-31  0.598000   3.671854  2.195769

说明:

我们可以像这样生成一个帮助器DF(我们已经添加了12个月到month列并删除了A列):

In [77]: df.drop('A',1).assign(month=df.month + pd.offsets.MonthEnd(12))
Out[77]:
        month          B
0  2018-02-28   0.000000
1  2018-03-31   3.797394
2  2018-04-30   3.754154
3  2018-05-31   7.038814
4  2018-06-30  11.072340
5  2018-07-31  11.562738
6  2018-08-31  10.692814
7  2018-09-30  11.321464
8  2018-10-31   7.688380
9  2018-11-30   4.381660
10 2018-12-31   3.671854
11 2019-01-31   3.798000
12 2019-02-28   4.784000
13 2019-03-31   5.384000
14 2019-04-30   1.346000
15 2019-05-31   0.000000
16 2019-06-30   0.000000
17 2019-07-31   0.000000
18 2019-08-31   0.000000
19 2019-09-30   0.000000
20 2019-10-31   0.000000
21 2019-11-30   0.000000
22 2019-12-31   0.000000

现在我们可以将它与原始DF合并(原始DF中我们不需要B列):

In [79]: (df.drop('B',1)
    ...:    .merge(df.drop('A',1)
    ...:             .assign(month=df.month + pd.offsets.MonthEnd(12)),
    ...:           on='month', how='left'))
Out[79]:
        month         A          B
0  2017-02-28  0.737757        NaN
1  2017-03-31  0.759479        NaN
2  2017-04-30  0.750831        NaN
3  2017-05-31  0.703881        NaN
4  2017-06-30  0.692021        NaN
5  2017-07-31  0.680161        NaN
6  2017-08-31  0.668301        NaN
7  2017-09-30  0.707592        NaN
8  2017-10-31  0.698944        NaN
9  2017-11-30  0.625951        NaN
10 2017-12-31  0.611976        NaN
11 2018-01-31  0.633000        NaN
12 2018-02-28  0.598000   0.000000
13 2018-03-31  0.673000   3.797394
14 2018-04-30  0.673000   3.754154
15 2018-05-31  0.609000   7.038814
16 2018-06-30  0.609000  11.072340
17 2018-07-31  0.609000  11.562738
18 2018-08-31  0.609000  10.692814
19 2018-09-30  0.673000  11.321464
20 2018-10-31  0.673000   7.688380
21 2018-11-30  0.598000   4.381660
22 2018-12-31  0.598000   3.671854

然后使用.eval("C = A * B", inplace=False)我们可以“动态”生成一个新列