Question

我的数据框看起来像这样：

Auction_id  bid_price  min_bid  rank
123         5          3        1
123         4          3        2
124         3          2        1
124         1          2        2

我想创建另一个返回MAX的列（排名1 min_bid，排名2 bid_price）。我不关心等级2列值的显示内容。我希望结果看起来像这样：

Auction_id  bid_price  min_bid  rank  custom_column
123         5          3        1     4
123         4          3        2     NaN/Don't care
124         3          2        1     2
124         1          2        2     NaN/Don't care

我应该在分组的auction_ids中进行迭代吗？有人可以提供解决此类问题需要熟悉的主题吗？

Answer 1

这是一种粗暴的做法。

创建maxminbid()功能，创建val= MAX（排名1 min_bid，排名2 bid_price）并将其分配给grp['custom_column']，rank==2将其存储为NaN {1}}

def maxminbid(grp):
    val = max(grp.loc[grp['rank']==1, 'min_bid'].values,
              grp.loc[grp['rank']==2, 'bid_price'].values)[0]
    grp['custom_column'] = val
    grp.loc[grp['rank']==2, 'custom_column'] = pd.np.nan
    return grp

然后对maxminbid分组对象

应用Auction_id函数

df.groupby('Auction_id').apply(maxminbid)


   Auction_id  bid_price  min_bid  rank  custom_column
0         123          5        3     1              4
1         123          4        3     2            NaN
2         124          3        2     1              2
3         124          1        2     2            NaN

但是，我怀疑，必须有一个优雅的解决方案而不是这个。

Answer 2

First, set the index equal to the Auction_id. Then you can use loc to select the appropriate values for each Auction_id and use max on their values. Finally, reset your index to return to your initial state.

df.set_index('Auction_id', inplace=True)
df['custom_column'] = pd.concat([df.loc[df['rank'] == 1, 'min_bid'],
                                 df.loc[df['rank'] == 2, 'bid_price']],
                                axis=1).max(axis=1)
df.reset_index(inplace=True)
>>> df
   Auction_id  bid_price  min_bid  rank  custom_column
0         123          5        3     1              4
1         123          4        3     2              4
2         124          3        2     1              2
3         124          1        2     2              2

Answer 3

Here's an approach that does some reshaping with pivot()

   Auction_id  bid_price  min_bid  rank  
0         123          5        3     1              
1         123          4        3     2           
2         124          3        2     1            
3         124          1        2     2

Then reshape your frame (df)

pv = df.pivot("Auction_id","rank")
pv
                   bid_price    min_bid   
rank               1    2       1  2
Auction_id                        
123                5    4       3  3
124                3    1       2  2

Adding a column to pv that contains the max. I"m using iloc to get a slice of the pv dataframe.

    pv["custom_column"]  = pv.iloc[:,[1,2]].max(axis=1)
    pv

                 bid_price    min_bid    custom_column
rank               1  2       1  2              
Auction_id                                      
123                5  4       3  3             4
124                3  1       2  2             2

and then add the max to the original frame (df) by mapping to our pv frame

df.loc[df["rank"] == 1,"custom_column"] = df["Auction_id"].map(pv["custom_column"])
df

  Auction_id  bid_price  min_bid  rank  custom_column
0         123          5        3     1              4
1         123          4        3     2            NaN
2         124          3        2     1              2
3         124          1        2     2            NaN

all the steps combined

pv = df.pivot("Auction_id","rank")
pv["custom_column"] = pv.iloc[:,[1,2]].max(axis=1)
df.loc[df["rank"] == 1,"custom_column"] = df["Auction_id"].map(pv["custom_column"])
df

  Auction_id  bid_price  min_bid  rank  custom_column
0         123          5        3     1              4
1         123          4        3     2            NaN
2         124          3        2     1              2
3         124          1        2     2            NaN

跨越不同分组行的Pandas MAX公式

3 个答案: