我的数据框看起来像这样:
Auction_id bid_price min_bid rank
123 5 3 1
123 4 3 2
124 3 2 1
124 1 2 2
我想创建另一个返回MAX的列(排名1 min_bid,排名2 bid_price)。我不关心等级2列值的显示内容。我希望结果看起来像这样:
Auction_id bid_price min_bid rank custom_column
123 5 3 1 4
123 4 3 2 NaN/Don't care
124 3 2 1 2
124 1 2 2 NaN/Don't care
我应该在分组的auction_ids中进行迭代吗?有人可以提供解决此类问题需要熟悉的主题吗?
答案 0 :(得分:2)
这是一种粗暴的做法。
创建maxminbid()
功能,创建val=
MAX(排名1 min_bid,排名2 bid_price)并将其分配给grp['custom_column']
,rank==2
将其存储为NaN
{1}}
def maxminbid(grp):
val = max(grp.loc[grp['rank']==1, 'min_bid'].values,
grp.loc[grp['rank']==2, 'bid_price'].values)[0]
grp['custom_column'] = val
grp.loc[grp['rank']==2, 'custom_column'] = pd.np.nan
return grp
然后对maxminbid
分组对象
Auction_id
函数
df.groupby('Auction_id').apply(maxminbid)
Auction_id bid_price min_bid rank custom_column
0 123 5 3 1 4
1 123 4 3 2 NaN
2 124 3 2 1 2
3 124 1 2 2 NaN
但是,我怀疑,必须有一个优雅的解决方案而不是这个。
答案 1 :(得分:2)
First, set the index equal to the Auction_id
. Then you can use loc
to select the appropriate values for each Auction_id
and use max on their values. Finally, reset your index to return to your initial state.
df.set_index('Auction_id', inplace=True)
df['custom_column'] = pd.concat([df.loc[df['rank'] == 1, 'min_bid'],
df.loc[df['rank'] == 2, 'bid_price']],
axis=1).max(axis=1)
df.reset_index(inplace=True)
>>> df
Auction_id bid_price min_bid rank custom_column
0 123 5 3 1 4
1 123 4 3 2 4
2 124 3 2 1 2
3 124 1 2 2 2
答案 2 :(得分:2)
Here's an approach that does some reshaping with pivot()
Auction_id bid_price min_bid rank
0 123 5 3 1
1 123 4 3 2
2 124 3 2 1
3 124 1 2 2
Then reshape your frame (df)
pv = df.pivot("Auction_id","rank")
pv
bid_price min_bid
rank 1 2 1 2
Auction_id
123 5 4 3 3
124 3 1 2 2
Adding a column to pv that contains the max. I"m using iloc to get a slice of the pv dataframe.
pv["custom_column"] = pv.iloc[:,[1,2]].max(axis=1)
pv
bid_price min_bid custom_column
rank 1 2 1 2
Auction_id
123 5 4 3 3 4
124 3 1 2 2 2
and then add the max to the original frame (df) by mapping to our pv frame
df.loc[df["rank"] == 1,"custom_column"] = df["Auction_id"].map(pv["custom_column"])
df
Auction_id bid_price min_bid rank custom_column
0 123 5 3 1 4
1 123 4 3 2 NaN
2 124 3 2 1 2
3 124 1 2 2 NaN
all the steps combined
pv = df.pivot("Auction_id","rank")
pv["custom_column"] = pv.iloc[:,[1,2]].max(axis=1)
df.loc[df["rank"] == 1,"custom_column"] = df["Auction_id"].map(pv["custom_column"])
df
Auction_id bid_price min_bid rank custom_column
0 123 5 3 1 4
1 123 4 3 2 NaN
2 124 3 2 1 2
3 124 1 2 2 NaN