我的svn pg svn:externals . | sed 's/trunk@revision/tags\/tagname/' | svn ps svn:externals . -F -
中有几列看起来像这样:
dataframe
我想做的是为每个ContextID Time_ms
1 09:12:48.502
1 09:12:48.603
1 09:12:48.934
2 09:15:36.434
2 09:15:36.654
3 09:17:55.940
3 09:17:56.160
3 09:17:57.267
创建一个名为TimeElapsed
的新列(最好包含毫秒值),并且它必须包含如下值:
ContextID
每个ContextID Time_ms Time_Elapsed
1 09:12:48.502 0
1 09:12:48.603 09:12:48.603 - 09:12:48.502
1 09:12:48.934 09:12:48.934 - 09:12:48:502
2 09:15:36.434 0
2 09:15:36.654 09:15:36.654 - 09:15:36.434
3 09:17:55.940 0
3 09:17:56.160 09:17:55.940 -09:17:55.940
3 09:17:57.267 09:17:57.267 - 09:17:55.940
的{{1}}的第一个值必须为0秒,然后必须从第一个Time_ms
中减去ContextID
的第二个值,依此类推,差异必须填充Time_ms
列。
我想知道如何使用python中的Pandas来实现。
谢谢
答案 0 :(得分:3)
减去groupby
+ transform
的结果:
#df['Time_ms'] = pd.to_timedelta(df.Time_ms)
df['Time_Elapsed'] = df.Time_ms - df.groupby('ContextID').Time_ms.transform('first')
ContextID Time_ms Time_Elapsed
0 1 09:12:48.502000 00:00:00
1 1 09:12:48.603000 00:00:00.101000
2 1 09:12:48.934000 00:00:00.432000
3 2 09:15:36.434000 00:00:00
4 2 09:15:36.654000 00:00:00.220000
5 3 09:17:55.940000 00:00:00
6 3 09:17:56.160000 00:00:00.220000
7 3 09:17:57.267000 00:00:01.327000
Transform用于将groupby结果广播回原始DataFrame
的形状。在这种情况下,我们需要第一个值,因此我们可以执行一次减法:
df.groupby('ContextID').Time_ms.transform('first')
#0 09:12:48.502000
#1 09:12:48.502000
#2 09:12:48.502000
#3 09:15:36.434000
#4 09:15:36.434000
#5 09:17:55.940000
#6 09:17:55.940000
#7 09:17:55.940000
#Name: Time_ms, dtype: timedelta64[ns]