给出以下python数据帧:
>>> import pandas
>>> df1 = pandas.DataFrame({"dish" : ["fish", "chicken", "fish", "chicken", "chicken", "veg","veg"],
... "location" : ["central", "central", "north", "north", "south", "central", "north"],
... "sales" : [1,3,5,2,4,2,2]})
>>> total_sales = df1.groupby(by="dish").sum().reset_index().set_index(["dish"])
>>> df1["proportion_sales"] = df1.apply((lambda row: row["sales"]/total_sales.loc[row["dish"]]), axis=1)
>>> df1
dish location sales proportion_sales
0 fish central 1 0.166667
1 chicken central 3 0.333333
2 fish north 5 0.833333
3 chicken north 2 0.222222
4 chicken south 4 0.444444
5 veg central 2 0.500000
6 veg north 2 0.500000
我想找出每个location
排名第1和排名第2的菜。例如,在central
中,chicken
排名为1,fish
排名为3.
如何更新dish_rank_in_location
df是这样的?这就是我所拥有的:
dish location sales proportion_sales rank
0 fish central 1 0.166667 1
1 chicken central 3 0.333333 1
2 fish north 5 0.833333 1
3 chicken north 2 0.222222 1
4 chicken south 4 0.444444 1
5 veg central 2 0.500000 1
6 veg north 2 0.500000 1
预期产出:
dish location sales proportion_sales dish_rank_in_location
0 fish central 1 0.166667 3
1 chicken central 3 0.333333 2
2 fish north 5 0.833333 1
3 chicken north 2 0.222222 3
4 chicken south 4 0.444444 1
5 veg central 2 0.500000 1
6 veg north 2 0.500000 2
答案 0 :(得分:2)
在groupby
使用rank
+ ascending=False
。
df1['dish_rank_in_location'] = df1.groupby('location')\
.proportion_sales.rank(method='dense', ascending=False)
df1
dish location sales proportion_sales dish_rank_in_location
0 fish central 1 0.166667 3.0
1 chicken central 3 0.333333 2.0
2 fish north 5 0.833333 1.0
3 chicken north 2 0.222222 3.0
4 chicken south 4 0.444444 1.0
5 veg central 2 0.500000 1.0
6 veg north 2 0.500000 2.0
如果您需要将等级作为整数,您可以随时进行投射 -
df1['dish_rank_in_location'].astype(int)
0 3
1 2
2 1
3 3
4 1
5 1
6 2
Name: dish_rank_in_location, dtype: int64
重新分配结果。