熊猫条件变化百分比

时间:2019-07-29 01:22:19

标签: python pandas dataframe

我有一个来自以下代码的数据框:

import pandas as pd

columns = ['type', 'value', 'weight']
fizz_or_bang = ['fizz', 'bang', 'fizz', 'bang', 'bang', 'fizz', 'bang', 'bang', 'fizz', 'bang', 'bang', 'fizz', 'bang', 'bang', 'fizz', 'bang']
values = [5, 4, 5, 7, 9, 4, 6, 12, 8, 12, 13, 2, 3, 4, 8, 6]
weight = [1, 1, 1, .5, .5, 1, .5, .5, 1, .5, .5, 1, .5, .5, 1, 1]

data = {'type': fizz_or_bang, 'value': values, 'weight': weight}

df = pd.DataFrame(data)

看起来像:

    type    value   weight
0   fizz    5   1.0
1   bang    4   1.0
2   fizz    5   1.0
3   bang    7   0.5
4   bang    9   0.5
5   fizz    4   1.0
6   bang    6   0.5
7   bang    12  0.5
8   fizz    8   1.0
9   bang    12  0.5
10  bang    13  0.5
11  fizz    2   1.0
12  bang    3   0.5
13  bang    4   0.5
14  fizz    8   1.0
15  bang    6   1.0

我想获取每个fizz bang对的百分比变化,以及每个fizz bang bang三重奏的平均百分比变化。我对如何使用pandas函数做到这一点感到有些困惑,我将在第4列中保存变化百分比。我知道我可以使用df['my_column'].pct_change()函数找到每个给定行的百分比变化,但是我不确定如何在前面提到的条件下做到这一点。

是否可以在不删除数据帧结构的情况下使用for循环?

以下是预期的输出:

    type    value   weight  pct_change
0   fizz    5   1.0 NaN
1   bang    4   1.0 -0.200
2   fizz    5   1.0 NaN
3   bang    7   0.5 NaN
4   bang    9   0.5 0.6
5   fizz    4   1.0 NaN
6   bang    6   0.5 NaN
7   bang    12  0.5 1.25
8   fizz    8   1.0 NaN
9   bang    12  0.5 NaN
10  bang    13  0.5 0.5625
11  fizz    2   1.0 NaN
12  bang    3   0.5 NaN
13  bang    4   0.5 0.75
14  fizz    8   1.0 NaN
15  bang    6   1.0 -0.25

泡沫爆炸对计算 (砰-嘶嘶声)/嘶嘶声

Fizz bang bang三重奏计算 ((((bang1 + bang2)/ 2)-嘶嘶声)/嘶嘶声

1 个答案:

答案 0 :(得分:0)

这需要cumsum创建密钥,而您只需要firstlast值,因此我们不能使用pct_change

s=df.type.eq('fizz').cumsum() #Create the subgroup
s1=df.value.mul(df.weight)[df.type.ne('fizz')].groupby(s).sum().values
#using the group key , find the the value we need compare with type == fizz for each group 
s2=df.loc[df.type.eq('fizz'),'value']
#store the value 
df.loc[~s.duplicated(keep='last'),'New']=((s1-s2)/s2).values
# since you only want to assign the value to last line of each group, here we using duplicated 
df
    type  value  weight     New
0   fizz      5     1.0     NaN
1   bang      4     1.0 -0.2000
2   fizz      5     1.0     NaN
3   bang      7     0.5     NaN
4   bang      9     0.5  0.6000
5   fizz      4     1.0     NaN
6   bang      6     0.5     NaN
7   bang     12     0.5  1.2500
8   fizz      8     1.0     NaN
9   bang     12     0.5     NaN
10  bang     13     0.5  0.5625
11  fizz      2     1.0     NaN
12  bang      3     0.5     NaN
13  bang      4     0.5  0.7500
14  fizz      8     1.0     NaN
15  bang      6     1.0 -0.2500