Question

我的数据框如下

import pandas as pd

df = pd.DataFrame({'UserId': [1,2,2,3,3,3,4,4,4,4], 'Value': [1,2,3,4,5,6,7,8,9,0]})

print(df)

现在，我想根据其最高重复值对UserId列进行排序/显示。在上述数据帧中，顺序为4、3、2、1。现在我的预期输出如下

df = pd.DataFrame({'UserId': [4,4,4,4,3,3,3,2,2,1], 'Value': [7,8,9,0,4,5,6,2,3,1]})

print(df)

在这里，我手动进行了操作。我需要大型数据框值的代码。指导我的情况。预先感谢。

Answer 1

您首先可以获取每个UserId的计数：

>>> counts = df.UserId.value_counts()
>>> counts
4    4
3    3
2    2
1    1
Name: UserId, dtype: int64

然后，您可以创建一个新列来指示每个用户的UserId计数（也可以通过合并完成）：

>>> df['UserIdCount'] = df['UserId'].apply(lambda x: counts.loc[x])
>>> df
   UserId  Value  UserIdCount
0       1      1            1
1       2      2            2
2       2      3            2
3       3      4            3
4       3      5            3
5       3      6            3
6       4      7            4
7       4      8            4
8       4      9            4
9       4      0            4

然后，您只需按此列排序：）

>>> df = df.sort_values('UserIdCount', ascending=False)
>>> df
   UserId  Value  UserIdCount
6       4      7            4
7       4      8            4
8       4      9            4
9       4      0            4
3       3      4            3
4       3      5            3
5       3      6            3
1       2      2            2
2       2      3            2
0       1      1            1

干杯！

Pandas_data frame / Python：如何根据其最高重复值计数对数据框列进行排序？

1 个答案: