如何找到基于另一列的时差

时间:2018-04-01 04:25:20

标签: python

我有这样的数据集

user-id      date-time                   msg
  1          2016-12-09 10:25:00          1
  2          2016-12-09 10:26:00          0
  3          2016-12-09 10:26:00          1
  2          2016-12-09 10:27:00          1
  1          2016-12-09 10:28:00          2
  2          2016-12-09 10:28:00          1
  3          2016-12-09 10:29:00          2
  2          2016-12-09 10:29:00          1
  1          2016-12-09 10:30:00          3

我希望有一个新列,用于计算每条记录与第一次消息与该记录类似的时间差。像这样:

 user-id      date-time                  msg        time-diffrence
  1          2016-12-09 10:25:00          1            00:00
  2          2016-12-09 10:26:00          0            00:00
  3          2016-12-09 10:26:00          1            01:00
  2          2016-12-09 10:27:00          1            02:00
  1          2016-12-09 10:28:00          2            00:00
  2          2016-12-09 10:28:00          1            03:00
  3          2016-12-09 10:29:00          2            01:00
  2          2016-12-09 10:29:00          1            04:00
  1          2016-12-09 10:30:00          3            00:00

我找到的解决方案只考虑日期时间,或使用loc或iloc,但它们不起作用。

1 个答案:

答案 0 :(得分:3)

选项#1

使用groupbyiloc

df['time-difference'] = df.groupby('msg')['date-time'].apply(lambda x: x - x.iloc[0])

输出:

   user-id           date-time  msg time-difference
0        1 2016-12-09 10:25:00    1        00:00:00
1        2 2016-12-09 10:26:00    0        00:00:00
2        3 2016-12-09 10:26:00    1        00:01:00
3        2 2016-12-09 10:27:00    1        00:02:00
4        1 2016-12-09 10:28:00    2        00:00:00
5        2 2016-12-09 10:28:00    1        00:03:00
6        3 2016-12-09 10:29:00    2        00:01:00
7        2 2016-12-09 10:29:00    1        00:04:00
8        1 2016-12-09 10:30:00    3        00:00:00

选项#2

groupbytransformfirstmin一起使用:

df['time-difference'] = df['date-time'] - df.groupby('msg')['date-time'].transform('first')

输出:

   user-id           date-time  msg time-difference
0        1 2016-12-09 10:25:00    1        00:00:00
1        2 2016-12-09 10:26:00    0        00:00:00
2        3 2016-12-09 10:26:00    1        00:01:00
3        2 2016-12-09 10:27:00    1        00:02:00
4        1 2016-12-09 10:28:00    2        00:00:00
5        2 2016-12-09 10:28:00    1        00:03:00
6        3 2016-12-09 10:29:00    2        00:01:00
7        2 2016-12-09 10:29:00    1        00:04:00
8        1 2016-12-09 10:30:00    3        00:00:00