如何重新排列数据框中的行并获得一个新的列,其中pandas中有2个其他列的百分比差异?

时间:2017-02-27 11:55:34

标签: python pandas dataframe

我有一个数据框如下所示:

Case    Peak 'A'    Peak 'B'    Volume 'C'  Volume 'D'
 1       5.00       4.00         0.34         0.32
 2       5.70       6.00         0.14         0.15
 3       11.00      20.00        0.42         0.50

预期输出如下所示:

enter image description here

其中:

' Diff Peak'需要添加百分比差异的列,即([(B-A)/ B] * 100)

' Diff Vol'列将添加([(D-C)/ D] * 100),这是百分比差异。

'在范围内'为峰值'如果是“Diff Peak”,则需要添加。在-15%到25%的范围内,则此列必须填写为是。如果不是,则如图所示。

类似地'在范围内'列已填入'卷'如果' Diff Vol'在-10%至20%的范围内。

我怎么可能这样做?

2 个答案:

答案 0 :(得分:1)

只需创建新列:

import numpy as np
df['Diff Peak'] = (df.B - df.A) / df.B  * 100
df['Diff Vol'] = (df.D - df.C) / df.D * 100
df['Within Range Peak'] = np.logical_and(df['Diff Peak'] >= -15.0, df['Diff Peak'] <= 25.0)
df['Within Range Vol'] = np.logical_and(df['Diff Vol'] >= -10.0, df['Diff Vol'] <= 20.0)

答案 1 :(得分:1)

如果列中不需要Multiindex,则可以使用

#use formulas
df['Diff Peak'] = df["Peak 'B'"].sub(df["Peak 'A'"]).div(df["Peak 'B'"]).mul(100)
df['Diff Vol'] = df["Volume 'D'"].sub(df["Volume 'C'"]).div(df["Volume 'D'"]).mul(100)
#check range, then add Yes or No
df['Peak Within Range'] = np.where(df['Diff Peak'].between(-15, 25), 'Yes', 'No')
df['Volumn Within Range'] = np.where(df['Diff Vol'].between(-10, 20), 'Yes', 'No')
#convert to string, rounding (if necessary), add %
df['Diff Peak'] = df['Diff Peak'].round(2).astype(str) + '%'
df['Diff Vol'] = df['Diff Vol'].round(2).astype(str) + '%'
print (df)
   Case  Peak 'A'  Peak 'B'  Volume 'C'  Volume 'D' Diff Peak Diff Vol  \
0     1       5.0       4.0        0.34        0.32    -25.0%   -6.25%   
1     2       5.7       6.0        0.14        0.15      5.0%    6.67%   
2     3      11.0      20.0        0.42        0.50     45.0%    16.0%   

  Peak Within Range Volumn Within Range  
0                No                 Yes  
1               Yes                 Yes  
2                No                 Yes  

但如果列中需要Multiindex

df = df.set_index('Case')
df['Peak Diff peak'] = df["Peak 'B'"].sub(df["Peak 'A'"]).div(df["Peak 'B'"]).mul(100)
df['Volume Diff Vol'] = df["Volume 'D'"].sub(df["Volume 'C'"]).div(df["Volume 'D'"]).mul(100)
df['Peak Within Range'] = np.where(df['Peak Diff peak'].between(-15, 25), 'Yes', 'No')
df['Volume Within Range'] = np.where(df['Volume Diff Vol'].between(-10, 20), 'Yes', 'No')
df['Peak Diff peak'] = df['Peak Diff peak'].round(2).astype(str) + '%'
df['Volume Diff Vol'] = df['Volume Diff Vol'].round(2).astype(str) + '%'

#filter columns start with Peak
df1 = df.filter(regex='^Peak')
#rename parts of columns 
df1.columns = df1.columns.str.replace('Peak', 'Peak (+25% to -15%)_')
#create MultiIndex
df1.columns = df1.columns.str.split('_ ', expand=True)
print (df1)
     Peak (+25% to -15%)                             
                     'A'   'B' Diff peak Within Range
Case                                                 
1                    5.0   4.0    -25.0%           No
2                    5.7   6.0      5.0%          Yes
3                   11.0  20.0     45.0%           No

#same as df1, only Volume    
df2 = df.filter(regex='^Volume')
df2.columns = df2.columns.str.replace('Volume', 'Volume (+20% to -10%)_')
df2.columns = df2.columns.str.split('_ ', expand=True)
print (df2)
     Volume (+20% to -10%)                            
                       'C'   'D' Diff Vol Within Range
Case                                                  
1                     0.34  0.32   -6.25%          Yes
2                     0.14  0.15    6.67%          Yes
3                     0.42  0.50    16.0%          Yes
#concat both dataframes to one
df3 = pd.concat([df1, df2], axis=1).reset_index()
print (df3)
  Case Peak (+25% to -15%)                              Volume (+20% to -10%)  \
                       'A'   'B' Diff peak Within Range                   'C'   
0    1                 5.0   4.0    -25.0%           No                  0.34   
1    2                 5.7   6.0      5.0%          Yes                  0.14   
2    3                11.0  20.0     45.0%           No                  0.42   


    'D' Diff Vol Within Range  
0  0.32   -6.25%          Yes  
1  0.15    6.67%          Yes  
2  0.50    16.0%          Yes