Question

我有以下数据框

A＆gt;

  Bucket    C   Count
PL14    XY23081063  706
PL14    XY23326234  15
PL14    XY23081062  1
PL14    XY23143628  1
FZ595   XY23157633  353
FZ595   XY23683174  107
XM274   XY23681818  139
XM274   XY23681819  108

现在我要插入一个新列＆＃34; Bucket_Rank＆＃34;排名＆＃34; C＆＃34;在每个＆＃34; Bucket＆＃34;基于＆＃34; Count＆＃34;

的降序值

所需输出： B＆gt;

Bucket  C   Count   Bucket_Rank
PL14    XY23081063  706 1
PL14    XY23326234  15  2
PL14    XY23081062  1   3
PL14    XY23143628  1   4
FZ595   XY23157633  353 1
FZ595   XY23683174  107 2
XM274   XY23681818  139 1
XM274   XY23681819  108 2

我尝试了以下链接中给出的解决方案

Ranking order per group in Pandas

命令：B [＆＃34; Bucket_Rank＆＃34;] = A.groupby（＆＃34; Bucket＆＃34;）[＆＃34; Count＆＃34;]。排名（＆＃34;密集＆＃ 34;，升序=假）

但它给了我以下错误..

TypeError: rank() got multiple values for argument 'axis'

During handling of the above exception, another exception occurred:

ValueError

帮助表示赞赏... TIA

Answer 1

使用groupby + argsort：

v = df.groupby('Bucket').Count\
         .transform(lambda x: np.argsort(-x) + 1)
v

0    1
1    2
2    3
3    4
4    1
5    2
6    1
7    2
Name: Count, dtype: int64

df['Bucket_Rank'] = v

如果您想使用rank，请指定method='dense'。最好明确指定每个关键字参数，以防止混淆。

df.groupby("Bucket")["Count"]\
      .rank(method="dense", ascending=False)

0    1.0
1    2.0
2    3.0
3    3.0
4    1.0
5    2.0
6    1.0
7    2.0
Name: Count, dtype: float64

请注意，您获得的结果并不完全符合您的预期，因为相同的排名被指定为相同的排名。如果你可以忍受，rank应该也能正常工作。

如何在Python中排名？

1 个答案: