我对pandas没有太多经验,我有以下DataFrame:
month A B
2/28/2017 0.7377573034 0
3/31/2017 0.7594787565 3.7973937824
4/30/2017 0.7508308808 3.7541544041
5/31/2017 0.7038814004 7.0388140044
6/30/2017 0.6920212254 11.0723396061
7/31/2017 0.6801610503 11.5627378556
8/31/2017 0.6683008753 10.6928140044
9/30/2017 0.7075915026 11.3214640415
10/31/2017 0.6989436269 7.6883798964
11/30/2017 0.6259514607 4.3816602247
12/31/2017 0.6119757303 3.671854382
1/31/2018 0.633 3.798
2/28/2018 0.598 4.784
3/31/2018 0.673 5.384
4/30/2018 0.673 1.346
5/31/2018 0.609 0
6/30/2018 0.609 0
7/31/2018 0.609 0
8/31/2018 0.609 0
9/30/2018 0.673 0
10/31/2018 0.673 0
11/30/2018 0.598 0
12/31/2018 0.598 0
我需要计算列C
,它基本上是列A
列B
列,但列B
的值是相应的前一年的值月。此外,对于上一年没有相应月份的值,该值应为零。更具体地说,这是我期望的C
:
C
0 # these values are zero because the corresponding month in the previous year is not in column A
0
0
0
0
0
0
0
0
0
0
0
0 # 0.598 * 0
2.5556460155552 # 0.673 * 3.7973937824
2.5265459139593 # 0.673 * 3.7541544041
4.2866377286796 # 0.609 * 7.0388140044
6.7430548201149 # 0.609 * 11.0723396061
7.0417073540604 # 0.609 * 11.5627378556
6.5119237286796 # 0.609 * 10.6928140044
7.6193452999295 # 0.673 * 11.3214640415
5.1742796702772 # 0.673 * 7.6883798964
2.6202328143706 # 0.598 * 4.3816602247
2.195768920436 # 0.598 * 3.671854382
我怎样才能做到这一点?我确信可能有办法不使用for循环。提前谢谢。
答案 0 :(得分:1)
In [73]: (df.drop('B',1)
...: .merge(df.drop('A',1)
...: .assign(month=df.month + pd.offsets.MonthEnd(12)),
...: on='month', how='left')
...: .eval("C = A * B", inplace=False)
...: .fillna(0)
...: )
...:
Out[73]:
month A B C
0 2017-02-28 0.737757 0.000000 0.000000
1 2017-03-31 0.759479 0.000000 0.000000
2 2017-04-30 0.750831 0.000000 0.000000
3 2017-05-31 0.703881 0.000000 0.000000
4 2017-06-30 0.692021 0.000000 0.000000
5 2017-07-31 0.680161 0.000000 0.000000
6 2017-08-31 0.668301 0.000000 0.000000
7 2017-09-30 0.707592 0.000000 0.000000
8 2017-10-31 0.698944 0.000000 0.000000
9 2017-11-30 0.625951 0.000000 0.000000
10 2017-12-31 0.611976 0.000000 0.000000
11 2018-01-31 0.633000 0.000000 0.000000
12 2018-02-28 0.598000 0.000000 0.000000
13 2018-03-31 0.673000 3.797394 2.555646
14 2018-04-30 0.673000 3.754154 2.526546
15 2018-05-31 0.609000 7.038814 4.286638
16 2018-06-30 0.609000 11.072340 6.743055
17 2018-07-31 0.609000 11.562738 7.041707
18 2018-08-31 0.609000 10.692814 6.511924
19 2018-09-30 0.673000 11.321464 7.619345
20 2018-10-31 0.673000 7.688380 5.174280
21 2018-11-30 0.598000 4.381660 2.620233
22 2018-12-31 0.598000 3.671854 2.195769
说明:
我们可以像这样生成一个帮助器DF(我们已经添加了12个月到month
列并删除了A
列):
In [77]: df.drop('A',1).assign(month=df.month + pd.offsets.MonthEnd(12))
Out[77]:
month B
0 2018-02-28 0.000000
1 2018-03-31 3.797394
2 2018-04-30 3.754154
3 2018-05-31 7.038814
4 2018-06-30 11.072340
5 2018-07-31 11.562738
6 2018-08-31 10.692814
7 2018-09-30 11.321464
8 2018-10-31 7.688380
9 2018-11-30 4.381660
10 2018-12-31 3.671854
11 2019-01-31 3.798000
12 2019-02-28 4.784000
13 2019-03-31 5.384000
14 2019-04-30 1.346000
15 2019-05-31 0.000000
16 2019-06-30 0.000000
17 2019-07-31 0.000000
18 2019-08-31 0.000000
19 2019-09-30 0.000000
20 2019-10-31 0.000000
21 2019-11-30 0.000000
22 2019-12-31 0.000000
现在我们可以将它与原始DF合并(原始DF中我们不需要B
列):
In [79]: (df.drop('B',1)
...: .merge(df.drop('A',1)
...: .assign(month=df.month + pd.offsets.MonthEnd(12)),
...: on='month', how='left'))
Out[79]:
month A B
0 2017-02-28 0.737757 NaN
1 2017-03-31 0.759479 NaN
2 2017-04-30 0.750831 NaN
3 2017-05-31 0.703881 NaN
4 2017-06-30 0.692021 NaN
5 2017-07-31 0.680161 NaN
6 2017-08-31 0.668301 NaN
7 2017-09-30 0.707592 NaN
8 2017-10-31 0.698944 NaN
9 2017-11-30 0.625951 NaN
10 2017-12-31 0.611976 NaN
11 2018-01-31 0.633000 NaN
12 2018-02-28 0.598000 0.000000
13 2018-03-31 0.673000 3.797394
14 2018-04-30 0.673000 3.754154
15 2018-05-31 0.609000 7.038814
16 2018-06-30 0.609000 11.072340
17 2018-07-31 0.609000 11.562738
18 2018-08-31 0.609000 10.692814
19 2018-09-30 0.673000 11.321464
20 2018-10-31 0.673000 7.688380
21 2018-11-30 0.598000 4.381660
22 2018-12-31 0.598000 3.671854
然后使用.eval("C = A * B", inplace=False)
我们可以“动态”生成一个新列