我有这样的数据集
user-id date-time msg
1 2016-12-09 10:25:00 1
2 2016-12-09 10:26:00 0
3 2016-12-09 10:26:00 1
2 2016-12-09 10:27:00 1
1 2016-12-09 10:28:00 2
2 2016-12-09 10:28:00 1
3 2016-12-09 10:29:00 2
2 2016-12-09 10:29:00 1
1 2016-12-09 10:30:00 3
我希望有一个新列,用于计算每条记录与第一次消息与该记录类似的时间差。像这样:
user-id date-time msg time-diffrence
1 2016-12-09 10:25:00 1 00:00
2 2016-12-09 10:26:00 0 00:00
3 2016-12-09 10:26:00 1 01:00
2 2016-12-09 10:27:00 1 02:00
1 2016-12-09 10:28:00 2 00:00
2 2016-12-09 10:28:00 1 03:00
3 2016-12-09 10:29:00 2 01:00
2 2016-12-09 10:29:00 1 04:00
1 2016-12-09 10:30:00 3 00:00
我找到的解决方案只考虑日期时间,或使用loc或iloc,但它们不起作用。
答案 0 :(得分:3)
使用groupby
和iloc
:
df['time-difference'] = df.groupby('msg')['date-time'].apply(lambda x: x - x.iloc[0])
输出:
user-id date-time msg time-difference
0 1 2016-12-09 10:25:00 1 00:00:00
1 2 2016-12-09 10:26:00 0 00:00:00
2 3 2016-12-09 10:26:00 1 00:01:00
3 2 2016-12-09 10:27:00 1 00:02:00
4 1 2016-12-09 10:28:00 2 00:00:00
5 2 2016-12-09 10:28:00 1 00:03:00
6 3 2016-12-09 10:29:00 2 00:01:00
7 2 2016-12-09 10:29:00 1 00:04:00
8 1 2016-12-09 10:30:00 3 00:00:00
将groupby
与transform
和first
或min
一起使用:
df['time-difference'] = df['date-time'] - df.groupby('msg')['date-time'].transform('first')
输出:
user-id date-time msg time-difference
0 1 2016-12-09 10:25:00 1 00:00:00
1 2 2016-12-09 10:26:00 0 00:00:00
2 3 2016-12-09 10:26:00 1 00:01:00
3 2 2016-12-09 10:27:00 1 00:02:00
4 1 2016-12-09 10:28:00 2 00:00:00
5 2 2016-12-09 10:28:00 1 00:03:00
6 3 2016-12-09 10:29:00 2 00:01:00
7 2 2016-12-09 10:29:00 1 00:04:00
8 1 2016-12-09 10:30:00 3 00:00:00