连续值和固定值之间的差异百分比

时间:2018-04-22 09:10:52

标签: python pandas

            A         B        C   D
0  2002-01-12  10:00:00     John  19
1  2002-01-12  11:00:00   Africa  15
2  2002-01-12  12:00:00     Mary  30
3  2002-01-13  09:00:00    Billy   5
4  2002-01-13  11:00:00     Mira   6
5  2002-01-13  12:00:00  Hillary  50
6  2002-01-13  12:00:00   Romina  50
7  2002-01-14  10:00:00   George  30
8  2002-01-14  11:00:00   Denzel  12
9  2002-01-14  11:00:00  Michael  12
10 2002-01-14  12:00:00     Bisc  25
11 2002-01-16  10:00:00   Virgin  16
12 2002-01-16  11:00:00  Antonio  10
13 2002-01-16  12:00:00     Sito   5

我想创建两个新列df['E']df['F'],知道相同的AB值始终对应于相同的D值:

df['E']D值的方差百分比与先前的D值相符。

df['F']D与之前D值之间的差异百分比为12:00:00。

输出应为:

            A         B        C   D       E      F
0  2002-01-12  10:00:00     John  19       0      0
1  2002-01-12  11:00:00   Africa  15  -21.05      0
2  2002-01-12  12:00:00     Mary  30  100.00      0
3  2002-01-13  09:00:00    Billy   5  -83.33 -83.33
4  2002-01-13  11:00:00     Mira   6   20.00 -80.00
5  2002-01-13  12:00:00  Hillary  50  733.33  66.66
6  2002-01-13  12:00:00   Romina  50  733.33  66.66
7  2002-01-14  10:00:00   George  30  -40.00 -40.00
8  2002-01-14  11:00:00   Denzel  12  -60.00 -76.00
9  2002-01-14  11:00:00  Michael  12  -60.00 -76.00
10 2002-01-14  12:00:00     Bisc  25  108.33 -50.00
11 2002-01-16  10:00:00   Virgin  16  -36.00 -36.00
12 2002-01-16  11:00:00  Antonio  10  -37.50 -60.00
13 2002-01-16  12:00:00     Sito   5  -50.00 -80.00

是否可以使用map来获取它?

我试过了:

x = df[df['B'].eq(time(12))].drop_duplicates(subset=['A']).set_index('A')['D'](100 * (df.D - df.D.shift(1)) / df.D.shift(1)).fillna(0)
df['F'] = df['A'].map(x)

1 个答案:

答案 0 :(得分:1)

使用:

df['E'] = df['D'].pct_change().mul(100).replace(0,np.nan).ffill().fillna(0).round(2)
s = df[df['B'].eq(time(12))].drop_duplicates(subset=['A']).set_index('A')['D']
df['F'] = (df['D'].div(df['A'].map(s.shift()))).sub(1).mul(100).round(2).fillna(0)
print (df)
             A         B        C   D       E      F
0   2002-01-12  10:00:00     John  19    0.00   0.00
1   2002-01-12  11:00:00   Africa  15  -21.05   0.00
2   2002-01-12  12:00:00     Mary  30  100.00   0.00
3   2002-01-13  09:00:00    Billy   5  -83.33 -83.33
4   2002-01-13  11:00:00     Mira   6   20.00 -80.00
5   2002-01-13  12:00:00  Hillary  50  733.33  66.67
6   2002-01-13  12:00:00   Romina  50  733.33  66.67
7   2002-01-14  10:00:00   George  30  -40.00 -40.00
8   2002-01-14  11:00:00   Denzel  12  -60.00 -76.00
9   2002-01-14  11:00:00  Michael  12  -60.00 -76.00
10  2002-01-14  12:00:00     Bisc  25  108.33 -50.00
11  2002-01-16  10:00:00   Virgin  16  -36.00 -36.00
12  2002-01-16  11:00:00  Antonio  10  -37.50 -60.00
13  2002-01-16  12:00:00     Sito   5  -50.00 -80.00

<强>解释

  1. 对于E列使用了pct_change,然后将0替换为NaN并转发填充NaN
  2. 对于Fformula,地理位置A12:00:00列中B行的映射列{{1}}