如何将自定义百分比更改应用于pandas数据框?

时间:2018-02-27 18:23:39

标签: python pandas dataframe apply

我想将自定义百分比应用于我的数据框行,每个公司ID的最后一行应始终为零。我尝试使用df.apply方法,但无法传递多个参数。感谢您是否可以通过多少方式告诉我这个问题可以解决的问题?提前感谢您的关注和努力!!

df.set_index('RECT_LCC', inplace=True)
df.diff(-1)


                    1           2         3
RECT_LCC            
RECT_LCC    -1.853864   -7.484296   9.33816
RECL_PI           NaN         NaN       NaN

输出应如下:

df = pd.DataFrame({'CompanyId' : ['A','A','A','B','B'],                           
                 'stand_alone' : [10,12,-5,20,1]})

def get_change(current,previous):
    if current==previous:
        return 0
    if current>=0 and previous<0:
        chg=1.0
    if current>=0 and previous==0:
        chg=1.0
    if current<0 and previous>0:
        chg=-1.0
    if current>0 and previous>0:
        chg=abs(current)/abs(previous)-1
    if current<0 and previous<0:
        chg=abs(current)/abs(previous)-1
        chg=-chg
    return round(chg*100,2)

2 个答案:

答案 0 :(得分:1)

好的,这是使用当前逻辑的一种方法。

def get_change(x):
    x=x.sort_index(ascending=False)
    cond1 = x == x.shift(1)
    result1 = 0
    cond2 = (x < 0) & x.shift(1) > 0
    result2 = -1
    cond3 = ((x>0) & (x.shift(1)>0)) | ((x<0) & (x.shift(1)<0))
    result3 = (x/x.shift(1)) - 1
    cond4 = ((x>=0)&(x.shift(1)<=0)) 
    result4 = 1
    result = np.select([cond1,cond2,cond3,cond4],[result1,result2,result3,result4])*100
    return result[::-1]

df['change'] = df.groupby('CompanyId')['stand_alone'].transform(get_change).round(2)
print(df)

输出:

  CompanyId  stand_alone   change
0         A           10   -16.67
1         A           12   100.00
2         A           -5     0.00
3         B           20  1900.00
4         B            1     0.00

我认为您需要使用此方法的关键功能是np.select使用if-then-elseif逻辑和groupby使用transform的方法。

答案 1 :(得分:1)

简单而直接的方法来添加具有先前值的列作为当前的移位。 在行上避免apply是你可以做的最后一件事,因为性能非常低(比过行略有效)

df  = df.assign(previous =  df.groupby('CompanyId').stand_alone.shift(-1)
               ).assign(chg = np.NaN)

df.loc[(df.stand_alone - df.previous)<1e-5,'chg'] = 0 #equal for float
df.loc[(df.stand_alone >= 0)&(df.previous <=0),'chg'] = 1.
df.loc[(df.stand_alone < 0)&(df.previous >0),'chg'] = -1.
mask = (df.stand_alone > 0)&(df.previous >0)
df.loc[mask,'chg'] = np.abs(df[mask].stand_alone/df[mask].previous)-1
mask = (df.stand_alone < 0)&(df.previous <0)
df.loc[mask,'chg'] = -np.abs(df[mask].stand_alone/df[mask].previous)+1
df['chg'] = np.round( df.chg.fillna(0)*100,2)
df.drop(columns=['previous'],inplace=True)
df 

输出:

    CompanyId   stand_alone chg
    0   A   10  -16.67
    1   A   12  100.0
    2   A   -5  0.0
    3   B   20  1900.0
    4   B   1   0.0

但是你可以通过代码中的小改动来实现它

def get_change(x):
    current = x['stand_alone']
    previous = x['previous']
    chg=0
    if current==previous:
        return 0
    if current>=0 and previous<0:
        chg=1.0
    if current>=0 and previous==0:
        chg=1.0
    if current<0 and previous>0:
        chg=-1.0
    if current>0 and previous>0:
        chg=abs(current)/abs(previous)-1
    if current<0 and previous<0:
        chg=abs(current)/abs(previous)-1
        chg=-chg
    return round(chg*100,2)

df['chg'] = df.assign(previous =  df.groupby('CompanyId').stand_alone.shift(-1)).apply(get_change,axis=1)