Question

我试图通过添加一个基于counter中的值表示column的列来将数据帧重新格式化为。（请参见下文）：

数据框如下：

[*]: df

     sid    rid  ratio  
0  49493  19070  1.498  
1  49498  19074  1.618  
2  50599  19074  1.618  
3  51602  19019  1.394  
4  51602  19019  1.209  
5  51602  19099  1.294  
6  51602  19099  1.194

我想groupby sid ，并将rid中的条目替换为unique occurrence-count or counter，如下所示：

     sid    rid  ratio  COUNT
0  49493  19070  1.498      1
1  49498  19074  1.618      1
2  50599  19074  1.618      1
3  51602  19019  1.394      1  <-- first unique value for sid == 51602
4  51602  19019  1.209      1
5  51602  19099  1.294      2  <-- second unique value for sid == 51602
6  51602  19099  1.194      2

到目前为止，我已经完成了，


df['counter'] = df.groupby('sid')['rid'].apply(ToCounter)

def ToCounter(grp):
  counts = grp.value_counts()
  uval   = grp.unique()
  ret = grp.copy(deep=True)
  for i,u in enumerate(uval):
    ret.loc[ret == u] = i + 1
  return ret

我正在使用的数据框很大，而groupby方法是very-slow。

是否有可以实现技巧的inbuilt function或simple method？

cumcount（）

cumcount()将计数并返回incremental counter for occurrences

我得到cumcount()，这不是我想要的

[*] df['CUMCOUNT'] = df.groupby('sid').cumcount() + 1

     sid    rid  ratio  COUNT  CUMCOUNT
0  49493  19070  1.498      1         1
1  49498  19074  1.618      1         1
2  50599  19074  1.618      1         1
3  51602  19019  1.394      1         1
4  51602  19019  1.209      1         2
5  51602  19099  1.294      2         3
6  51602  19099  1.194      2         4

解决方案

how-to-use-groupby-and-cumcount-on-unique-names-in-a-pandas-column

熊猫用计数器值替换列

0 个答案: