Python:如何从列中减去时间戳并创建新的TimeElapsed列?

时间:2019-04-17 13:48:44

标签: python python-3.x pandas

我的svn pg svn:externals . | sed 's/trunk@revision/tags\/tagname/' | svn ps svn:externals . -F - 中有几列看起来像这样:

dataframe

我想做的是为每个ContextID Time_ms 1 09:12:48.502 1 09:12:48.603 1 09:12:48.934 2 09:15:36.434 2 09:15:36.654 3 09:17:55.940 3 09:17:56.160 3 09:17:57.267 创建一个名为TimeElapsed的新列(最好包含毫秒值),并且它必须包含如下值:

ContextID

每个ContextID Time_ms Time_Elapsed 1 09:12:48.502 0 1 09:12:48.603 09:12:48.603 - 09:12:48.502 1 09:12:48.934 09:12:48.934 - 09:12:48:502 2 09:15:36.434 0 2 09:15:36.654 09:15:36.654 - 09:15:36.434 3 09:17:55.940 0 3 09:17:56.160 09:17:55.940 -09:17:55.940 3 09:17:57.267 09:17:57.267 - 09:17:55.940 的{​​{1}}的第一个值必须为0秒,然后必须从第一个Time_ms中减去ContextID的第二个值,依此类推,差异必须填充Time_ms列。

我想知道如何使用python中的Pandas来实现。

谢谢

1 个答案:

答案 0 :(得分:3)

减去groupby + transform的结果:

#df['Time_ms'] = pd.to_timedelta(df.Time_ms)
df['Time_Elapsed'] = df.Time_ms - df.groupby('ContextID').Time_ms.transform('first')

   ContextID         Time_ms    Time_Elapsed
0          1 09:12:48.502000        00:00:00
1          1 09:12:48.603000 00:00:00.101000
2          1 09:12:48.934000 00:00:00.432000
3          2 09:15:36.434000        00:00:00
4          2 09:15:36.654000 00:00:00.220000
5          3 09:17:55.940000        00:00:00
6          3 09:17:56.160000 00:00:00.220000
7          3 09:17:57.267000 00:00:01.327000

Transform用于将groupby结果广播回原始DataFrame的形状。在这种情况下,我们需要第一个值,因此我们可以执行一次减法:

df.groupby('ContextID').Time_ms.transform('first')

#0   09:12:48.502000
#1   09:12:48.502000
#2   09:12:48.502000
#3   09:15:36.434000
#4   09:15:36.434000
#5   09:17:55.940000
#6   09:17:55.940000
#7   09:17:55.940000
#Name: Time_ms, dtype: timedelta64[ns]