Question

我有以下数据框。

import pandas as pd
import numpy as np
d = {

    'ID':[1,2,3,4,5],
    'Price1':[5,9,4,3,9],
    'Price2':[9,10,13,14,18],
    'Price3':[5,9,4,3,9],
    'Price4':[9,10,13,14,18],
    'Price5':[5,9,4,3,9],
    'Price6':[np.nan,10,13,14,18],
    'Price7':[np.nan,9,4,3,9],
    'Price8':[np.nan,10,13,14,18],
    'Price9':[5,9,4,3,9],
    'Price10':[9,10,13,14,18],
     'Type':['A','A','B','C','D'],


}
df = pd.DataFrame(data = d)
df

如何比较价格1和价格10列并将第二个最大值添加为新列？

预期输出：

import pandas as pd
import numpy as np
d = {

    'ID':[1,2,3,4,5],
    'Price1':[5,9,4,3,9],
    'Price2':[9,10,13,14,18],
    'Price3':[5,9,4,3,9],
    'Price4':[9,10,13,14,18],
    'Price5':[5,9,4,3,9],
    'Price6':[np.nan,10,13,14,18],
    'Price7':[np.nan,9,4,3,9],
    'Price8':[np.nan,10,13,14,18],
    'Price9':[5,9,4,3,9],
    'Price10':[9,10,13,14,18],
     'Type':['A','A','B','C','D'],
    'Second_Max':[5,9,4,3,18]


}
df = pd.DataFrame(data = d)
df

如何比较价格1和价格10列并将第二个最大值添加为新列？

Answer 1

一种方法

df['Second_Max'] = df.drop(['ID','Type'], axis=1).fillna(0).apply(lambda x: (sorted(list(set(x)), reverse=True))[1], axis=1)

或

df['Second_Max'] =  df.filter(like='Price').fillna(0).apply(lambda x: (sorted(list(set(x)), reverse=True))[1], axis=1)

输出

   ID  Price1  Price2  Price3  Price4  Price5  Price6  Price7  Price8  Price9  \
0   1       5       9       5       9       5     NaN     NaN     NaN       5   
1   2       9      10       9      10       9    10.0     9.0    10.0       9   
2   3       4      13       4      13       4    13.0     4.0    13.0       4   
3   4       3      14       3      14       3    14.0     3.0    14.0       3   
4   5       9      18       9      18       9    18.0     9.0    18.0       9   

   Price10 Type  Second_Max  
0        9    A         5.0  
1       10    A         9.0  
2       13    B         4.0  
3       14    C         3.0  
4       18    D         9.0

或更有效的方法是使用heapq

Find the 2nd highest element

Answer 2

沿轴= 1使用lambda函数，然后使用nlargest获得前2个元素。

df['Second_Max'] = df.iloc[:,:-1].apply(lambda x: x.drop_duplicates().nlargest(2)[1], 1)

Answer 3

使用np.sort()添加另一种方式：

m=df.filter(like='Price')
df['second_highest']=abs(np.sort(-m.apply(lambda x:
                    x.drop_duplicates(),axis=1),axis=1))[:,1]
print(df)

没有apply()的另一种方式可能是：

m=df.filter(like='Price')
df['second_highest']=(m.T.sort_values(m.index.tolist(),ascending=False).
                          drop_duplicates().iloc[1])

Answer 4

这可以在np.unique之后使用np.sort完成：

df['Second_Max'] = df.filter(like='Price').apply(lambda x: np.unique(np.sort(x.dropna()))[-2], axis=1)

   ID  Price1  Price2  Price3  Price4  Price5  Price6  Price7  Price8  Price9  Price10 Type  Second_Max
0   1       5       9       5       9       5     NaN     NaN     NaN       5        9    A         5.0
1   2       9      10       9      10       9    10.0     9.0    10.0       9       10    A         9.0
2   3       4      13       4      13       4    13.0     4.0    13.0       4       13    B         4.0
3   4       3      14       3      14       3    14.0     3.0    14.0       3       14    C         3.0
4   5       9      18       9      18       9    18.0     9.0    18.0       9       18    D         9.0

如何使第二个最大值成为pandas中的新列？

4 个答案: