我有一个数据框:
bq_selection_id bq_balance bq_market_id bq_back_price
0 45094462 185.04 155 1.87
1 45094462 185.04 155 1.97
2 45094463 185.04 155 3.05
3 45094463 185.04 156 3.05
4 45094464 185.04 156 5.80
5 45094464 185.04 156 5.80
6 45094466 185.04 157 200.00
7 45094466 185.04 157 200.00
8 45094465 185.04 157 NaN
9 45094465 185.04 157 NaN
我想有两个额外的列second_lowest,none_values每个组,groupby market id。对于市场id 155 second_lowest 1.97而且没有NaN值,因此none_values为False。我想得到类似的东西:
bq_selection_id bq_balance bq_market_id bq_back_price second_lowest none_val
0 45094462 185.04 155 1.87 1.97 False
1 45094462 185.04 155 1.97 1.97 False
2 45094463 185.04 155 3.05 1.97 False
3 45094463 185.04 156 3.05 5.80 False
4 45094464 185.04 156 5.80 5.80 False
5 45094464 185.04 156 6.40 5.80 False
6 45094466 185.04 157 1.00 1.70 True
7 45094466 185.04 157 1.70 1.70 True
8 45094465 185.04 157 NaN 1.70 True
9 45094465 185.04 157 NaN 1.70 True
请帮帮我吗?
答案 0 :(得分:4)
结合您之前问题中使用的提示(1,2),
您可以使用groupby/transform
为DataFrame中的每一行分配一个新值:
import numpy as np
import pandas as pd
pd.options.display.width = 1000
df = pd.DataFrame(
{'bq_back_price': [1.87, 1.97, 3.05, 3.05, 5.8, 5.8, 200.0, 200.0, np.nan, np.nan],
'bq_balance': [1850.4, 1850.4, 1850.4, 1850.4, 1850.4, 1850.4, 1850.4,
1850.4, 1850.4, 1850.4],
'bq_market_id': [155, 155, 155, 156, 156, 156, 157, 157, 157, 157],
'bq_selection_id': [45094462, 45094462, 45094463, 45094463, 45094464,
45094464, 45094466, 45094466, 45094465, 45094465]})
grouped = df.groupby('bq_market_id')['bq_back_price']
df['second_lowest'] = grouped.transform(lambda x: x.nsmallest(2).max())
df['has_null'] = grouped.transform(lambda x: pd.isnull(x).any()).astype(bool)
print(df)
产量
bq_back_price bq_balance bq_market_id bq_selection_id second_lowest has_null
0 1.87 1850.4 155 45094462 1.97 False
1 1.97 1850.4 155 45094462 1.97 False
2 3.05 1850.4 155 45094463 1.97 False
3 3.05 1850.4 156 45094463 5.80 False
4 5.80 1850.4 156 45094464 5.80 False
5 5.80 1850.4 156 45094464 5.80 False
6 200.00 1850.4 157 45094466 200.00 True
7 200.00 1850.4 157 45094466 200.00 True
8 NaN 1850.4 157 45094465 200.00 True
9 NaN 1850.4 157 45094465 200.00 True
答案 1 :(得分:0)
怎么样:
gb = df.groupby('bq_market_id')
df['second_lowest'] = gb.bq_back_price.apply(lambda x: x.sort_values(ascending=False).iloc[1])[df.bq_market_id]
df['none_val'] = gb.bq_back_price.apply(lambda x: x.isnull().values.any())[df.bq_market_id]