如何使用熊猫根据条件查找列的最大值并按顺序排列它们?

时间:2019-05-02 11:37:22

标签: pandas

我有以下数据框

import pandas as pd
import numpy as np
d = {

    'ID':[1,2,3,4,5,6],
    'Price1':[5,9,4,3,9,np.nan],
    'Price2':[9,10,13,14,18,np.nan],
    'Price5':[5,9,4,3,9,np.nan],
    'Price6':[np.nan,10,13,14,18,np.nan],
    'Price10':[9,10,13,14,18,np.nan],
    'Price3':[5,9,4,3,9,np.nan],
    'Price4':[9,10,13,14,18,np.nan],
    'Price7':[np.nan,9,4,3,9,np.nan],
    'Price8':[np.nan,10,13,14,18,np.nan],
    'Price9':[5,9,4,3,9,np.nan],
    'Type':['A','A','B','C','D','D'],


}
df = pd.DataFrame(data = d)
df

如何使用熊猫根据条件查找列的最大值并按顺序排列它们?

查找价格1,价格2,价格5,价格6,价格10的最大值,然后将新列添加为maxA

查找价格3,价格4,价格7,价格8,价格9的最大值,然后将新列添加为maxB

预期输出:

import pandas as pd
import numpy as np
d = {

    'ID':[1,2,3,4,5,6],
    'Price1':[5,9,4,3,9,np.nan],
    'Price2':[9,10,13,14,18,np.nan],
    'Price3':[5,9,4,3,9,np.nan],
    'Price4':[9,10,13,14,18,np.nan],
    'Price5':[5,9,4,3,9,np.nan],
    'Price6':[np.nan,10,13,14,18,np.nan],
    'Price7':[np.nan,9,4,3,9,np.nan],
    'Price8':[np.nan,10,13,14,18,np.nan],
    'Price9':[5,9,4,3,9,np.nan],
    'Price10':[9,10,13,14,18,np.nan],
     'Type':['A','A','B','C','D','D'],
    'maxA1':[9,10,13,14,18,np.nan],
    'maxA2':[9,10,13,14,18,np.nan],
    'maxA3':[5,10,13,14,18,np.nan],
    'maxA4':[5,9,4,3,9,np.nan],
    'maxA5':[np.nan,9,4,3,9,np.nan],
    'maxB1':[9,10,13,14,18,np.nan],
    'maxB2':[5,10,13,14,18,np.nan],
    'maxB3':[5,9,4,3,9,np.nan],
    'maxB4':[np.nan,9,4,3,9,np.nan],
    'maxB5':[np.nan,9,4,3,9,np.nan],


}
df = pd.DataFrame(data = d)
pd.set_option('max_columns',25)
df

1 个答案:

答案 0 :(得分:2)

使用:

c1 = ['Price1', 'Price2', 'Price5','Price6','Price10']
col1=[f"maxA{i+1}" for i in range(len(c1))]
#['maxA1', 'maxA2', 'maxA3', 'maxA4', 'maxA5']
c2 = ['Price3', 'Price4', 'Price7', 'Price8', 'Price9']
col2=[f"maxB{i+1}" for i in range(len(c2))]
#['maxB1', 'maxB2', 'maxB3', 'maxB4', 'maxB5']

a = pd.DataFrame(abs(np.sort(-df[c1],axis=1)),columns=col1)
b=pd.DataFrame(abs(np.sort(-df[c2],axis=1)),columns=col2)

df_new=pd.concat([df,a,b],axis=1)
print(df_new)

   ID  Price1  Price2  Price5  Price6  Price10  Price3  Price4  Price7  \
0   1     5.0     9.0     5.0     NaN      9.0     5.0     9.0     NaN   
1   2     9.0    10.0     9.0    10.0     10.0     9.0    10.0     9.0   
2   3     4.0    13.0     4.0    13.0     13.0     4.0    13.0     4.0   
3   4     3.0    14.0     3.0    14.0     14.0     3.0    14.0     3.0   
4   5     9.0    18.0     9.0    18.0     18.0     9.0    18.0     9.0   
5   6     NaN     NaN     NaN     NaN      NaN     NaN     NaN     NaN   

   Price8  Price9 Type  maxA1  maxA2  maxA3  maxA4  maxA5  maxB1  maxB2  \
0     NaN     5.0    A    9.0    9.0    5.0    5.0    NaN    9.0    5.0   
1    10.0     9.0    A   10.0   10.0   10.0    9.0    9.0   10.0   10.0   
2    13.0     4.0    B   13.0   13.0   13.0    4.0    4.0   13.0   13.0   
3    14.0     3.0    C   14.0   14.0   14.0    3.0    3.0   14.0   14.0   
4    18.0     9.0    D   18.0   18.0   18.0    9.0    9.0   18.0   18.0   
5     NaN     NaN    D    NaN    NaN    NaN    NaN    NaN    NaN    NaN   

   maxB3  maxB4  maxB5  
0    5.0    NaN    NaN  
1    9.0    9.0    9.0  
2    4.0    4.0    4.0  
3    3.0    3.0    3.0  
4    9.0    9.0    9.0  
5    NaN    NaN    NaN