我有这个数据框:
dict_data = {'id' : [1,1,1,2,2,2,2,2],
'datetime' : np.array(['2016-01-03T16:05:52.000000000', '2016-01-03T16:05:52.000000000',
'2016-01-03T16:05:52.000000000', '2016-01-27T15:45:20.000000000',
'2016-01-27T15:45:20.000000000', '2016-11-27T15:08:04.000000000',
'2016-11-27T15:08:04.000000000', '2016-11-27T15:08:04.000000000'], dtype='datetime64[ns]')}
df_data=pd.DataFrame(dict_data)
数据看起来像这样
我想对客户ID和日期进行排名,我使用了这段代码
(df_data.assign(rn=df_data.sort_values(['datetime'], ascending=True)
....: .groupby(['datetime','id'])
....: .cumcount() + 1)
....: .sort_values(['datetime','rn'])
....: )
每个日期我按ID获得不同的排名:
我希望看到的是按ID排名,但是对于相同的日期时间,每个ID都会获得相同的排名。
答案 0 :(得分:0)
以下是按日期时间和ID排名的方式:
##### RANK BY datetime and id #####
In[]: df_data.rank(axis =0,ascending = 1, method = 'dense')
Out[]:
datetime id
0 1 1
1 1 1
2 1 1
3 2 2
4 2 2
5 3 2
6 3 2
7 3 2
##### GROUPBY id AND USE APPLY TO GET VALUE FOR FOR EACH GROUP #####
In[]: df_data.rank(axis =0,ascending = 1, method = 'dense').groupby('id').apply(lambda x: x)
Out[]:
datetime id
0 1 1
1 1 1
2 1 1
3 2 2
4 2 2
5 3 2
6 3 2
7 3 2
##### THEN RANK INSIDE EACH GROUP #####
In[]: df_data.assign(rank=df_data.rank(axis =0,ascending = 1, method = 'dense').groupby('id').apply(lambda x: x.rank(axis =0,ascending = 1, method = 'dense'))['datetime'])
Out[]:
datetime id rank
0 2016-01-03 16:05:52 1 1
1 2016-01-03 16:05:52 1 1
2 2016-01-03 16:05:52 1 1
3 2016-01-27 15:45:20 2 1
4 2016-01-27 15:45:20 2 1
5 2016-11-27 15:08:04 2 2
6 2016-11-27 15:08:04 2 2
7 2016-11-27 15:08:04 2 2
如果您想更改排名方法,我会从pandas documentation on ranking
获取更多关于排名的信息