我有一个熊猫数据框:
Write time after upload took: 1436862ns
Write time after upload took: 51896ns
Write time after upload took: 44083ns
Write time after upload took: 18412ns
Write time after upload took: 18217ns
Write time after upload took: 38242ns
Write time after upload took: 20739ns
保持顺序不变,如何对user, cat
---------
'a', 1
'a', 2
'b', 1.2
'b', 2.1
'a', 0.2
'a', 1.9
'b', 2.1
进行排名,以便为每组连续的user
记录分配新的排名?
所以,我正在寻找的输出是:
user
从上面的示例中,您可以看到用户user, cat, rank
---------------
'a', 1, 1
'a', 2, 1
'b', 1.2, 1
'b', 2.1, 1
'a', 0.2, 2
'a', 1.9, 2
'b', 2.1, 2
ia的第一次出现被分配了等级1,第二次出现被分配了等级2。
我一直在尝试熊猫a
函数,但没有帮助:
rank
谢谢。
答案 0 :(得分:1)
这基本上是一个空白问题。
df['change'] = df['user'] != df['user'].shift()
df['rank'] = df.groupby('user')['change'].cumsum().astype('int')
结果:
user cat change rank
0 'a' 1.0 True 1
1 'a' 2.0 False 1
2 'b' 1.2 True 1
3 'b' 2.1 False 1
4 'a' 0.2 True 2
5 'a' 1.9 False 2
6 'b' 2.1 True 2
编辑::如果要按多列分组(例如user
和city
):
cols = ['user', 'city']
df['change'] = np.any(df[cols] != df[cols].shift(), axis=1)
df['rank'] = df.groupby(cols)['change'].cumsum().astype('int')