我有一个带有用户ID和两个不同时间的数据框。 time1
对于一个用户是相同的,但是time2
是不同的。
test = pd.DataFrame({
'id': [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2],
'time1': ['2018-11-01 21:19:32', '2018-11-01 21:19:32', '2018-11-01 21:19:32','2018-11-01 21:19:32','2018-11-01 21:19:32',
'2018-11-01 21:19:32', '2018-11-01 21:19:32', '2018-11-01 21:19:32','2018-11-01 21:19:32','2018-11-01 21:19:32',
'2018-11-02 11:20:12', '2018-11-02 11:20:12','2018-11-02 11:20:12','2018-11-02 11:20:12','2018-11-02 11:20:12'],
'time2': ['2018-11-01 10:19:32', '2018-11-01 22:19:32', '2018-11-01 12:19:32','2018-11-01 23:44:32','2018-11-01 14:19:32',
'2018-11-01 15:19:32', '2018-11-01 11:19:32', '2018-11-01 23:19:32','2018-11-01 13:22:32','2018-11-01 23:56:32',
'2018-11-02 11:57:12', '2018-11-02 10:20:12','2018-11-02 11:25:12','2018-11-02 11:32:12','2018-11-02 09:15:12']
})
我想创建一个row_num
列,该列根据time2
对time1
进行排序和计数。在time1
之前发生的所有事情都算作反向:
id time1 time2 row_num
0 1 2018-11-01 21:19:32 2018-11-01 10:19:32 -6
1 1 2018-11-01 21:19:32 2018-11-01 11:19:32 -5
2 1 2018-11-01 21:19:32 2018-11-01 12:19:32 -4
3 1 2018-11-01 21:19:32 2018-11-01 13:19:32 -3
4 1 2018-11-01 21:19:32 2018-11-01 14:19:32 -2
5 1 2018-11-01 21:19:32 2018-11-01 15:19:32 -1
6 1 2018-11-01 21:19:32 2018-11-01 22:19:32 1
7 1 2018-11-01 21:19:32 2018-11-01 23:19:32 2
8 1 2018-11-01 21:19:32 2018-11-01 23:44:32 3
9 1 2018-11-01 21:19:32 2018-11-01 23:56:32 4
10 2 2018-11-02 11:20:12 2018-11-02 09:20:12 -2
11 2 2018-11-02 11:20:12 2018-11-02 10:20:12 -1
12 2 2018-11-02 11:20:12 2018-11-02 11:25:12 1
13 2 2018-11-02 11:20:12 2018-11-02 11:32:12 2
14 2 2018-11-02 11:20:12 2018-11-02 11:57:12 3
将感谢您的帮助和建议!
答案 0 :(得分:2)
将cumcount
与参数ascending=False
一起使用,
#necessary unique default RangeIndex
test = test.reset_index(drop=True)
#convert columns to datetimes
test[['time1','time2']] = test[['time1','time2']].apply(pd.to_datetime)
#sorting both columns
test = test.sort_values(['id','time1','time2'])
#boolean mask
m = test['time2'] < test['time1']
#filter and get counter, last join togather
test['row_num'] = pd.concat([(test[m].groupby('id').cumcount(ascending=False) +1) * -1,
test[~m].groupby('id').cumcount() + 1])
print (test)
id time1 time2 row_num
0 1 2018-11-01 21:19:32 2018-11-01 10:19:32 -6
6 1 2018-11-01 21:19:32 2018-11-01 11:19:32 -5
2 1 2018-11-01 21:19:32 2018-11-01 12:19:32 -4
8 1 2018-11-01 21:19:32 2018-11-01 13:22:32 -3
4 1 2018-11-01 21:19:32 2018-11-01 14:19:32 -2
5 1 2018-11-01 21:19:32 2018-11-01 15:19:32 -1
1 1 2018-11-01 21:19:32 2018-11-01 22:19:32 1
7 1 2018-11-01 21:19:32 2018-11-01 23:19:32 2
3 1 2018-11-01 21:19:32 2018-11-01 23:44:32 3
9 1 2018-11-01 21:19:32 2018-11-01 23:56:32 4
14 2 2018-11-02 11:20:12 2018-11-02 09:15:12 -2
11 2 2018-11-02 11:20:12 2018-11-02 10:20:12 -1
12 2 2018-11-02 11:20:12 2018-11-02 11:25:12 1
13 2 2018-11-02 11:20:12 2018-11-02 11:32:12 2
10 2 2018-11-02 11:20:12 2018-11-02 11:57:12 3