Question

给出以下python数据帧：

>>> import pandas
>>> df1 = pandas.DataFrame({"dish"     : ["fish", "chicken", "fish", "chicken", "chicken", "veg","veg"],
...                         "location" : ["central", "central", "north", "north", "south", "central", "north"],
...                         "sales" : [1,3,5,2,4,2,2]})
>>> total_sales = df1.groupby(by="dish").sum().reset_index().set_index(["dish"])
>>> df1["proportion_sales"] = df1.apply((lambda row: row["sales"]/total_sales.loc[row["dish"]]), axis=1)
>>> df1
      dish location  sales  proportion_sales
0     fish  central      1          0.166667
1  chicken  central      3          0.333333
2     fish    north      5          0.833333
3  chicken    north      2          0.222222
4  chicken    south      4          0.444444
5      veg  central      2          0.500000
6      veg    north      2          0.500000

我想找出每个location排名第1和排名第2的菜。例如，在central中，chicken排名为1，fish排名为3.

如何更新dish_rank_in_location df是这样的？这就是我所拥有的：

      dish location  sales  proportion_sales  rank
0     fish  central      1          0.166667     1
1  chicken  central      3          0.333333     1
2     fish    north      5          0.833333     1
3  chicken    north      2          0.222222     1
4  chicken    south      4          0.444444     1
5      veg  central      2          0.500000     1
6      veg    north      2          0.500000     1

预期产出：

      dish location  sales  proportion_sales  dish_rank_in_location
0     fish  central      1          0.166667     3
1  chicken  central      3          0.333333     2
2     fish    north      5          0.833333     1
3  chicken    north      2          0.222222     3
4  chicken    south      4          0.444444     1
5      veg  central      2          0.500000     1
6      veg    north      2          0.500000     2

Answer 1

在groupby使用rank + ascending=False。

df1['dish_rank_in_location'] = df1.groupby('location')\
               .proportion_sales.rank(method='dense', ascending=False)

df1

      dish location  sales  proportion_sales  dish_rank_in_location
0     fish  central      1          0.166667                    3.0
1  chicken  central      3          0.333333                    2.0
2     fish    north      5          0.833333                    1.0
3  chicken    north      2          0.222222                    3.0
4  chicken    south      4          0.444444                    1.0
5      veg  central      2          0.500000                    1.0
6      veg    north      2          0.500000                    2.0

如果您需要将等级作为整数，您可以随时进行投射 -

df1['dish_rank_in_location'].astype(int)

0    3
1    2
2    1
3    3
4    1
5    1
6    2
Name: dish_rank_in_location, dtype: int64

重新分配结果。

在DF中查找分组值的等级1和等级2

1 个答案: