我有一个像这样的数据帧熊猫:
cEventID arrivalTime
1167533
1167541 2015-07-14 04:01:21
1167545 2015-07-14 04:03:20
1167549 2015-07-14 04:07:45
1167552 2015-07-14 04:10:21
1167553 2015-07-14 04:13:39
1167558 2015-07-14 04:15:58
1167561 2015-07-14 04:20:23
我需要减去一个事件和另一个事件之间的时间,结果是:
EventID arrivalTime diff time
1167541 2015-07-14 04:01:21 0
1167545 2015-07-14 04:03:20 00:01:59
1167549 2015-07-14 04:07:45 00:04:25
1167552 2015-07-14 04:10:21 00:02:36
1167553 2015-07-14 04:13:39 00:03:18
1167558 2015-07-14 04:15:58 00:02:19
1167561 2015-07-14 04:20:23 00:04:25
我得到了一个关于pandas数据帧的for循环的结果,其中 代表索引,datos.iterrows()中的行: 。我的职责是:
def llegadas(datos, estacion):
datos = datos
lista = []
filas = []
segundos = []
for index, row in datos.iterrows():
if index == 0:
lista.append('00:00:00')
filas.append(row)
else:
ii = filas[len(filas)-1][6]
ff = datos['arrivalTime'][index]
deltat = str( dt.datetime.strptime(ff, '%Y-%m-%d %H:%M:%S') - dt.datetime.strptime(ii, '%Y-%m-%d %H:%M:%S') )
lista.append(deltat)
filas.append(row)
df1 = pd.DataFrame(lista)
frames = [datos, df1]
datos = pd.concat(frames, axis=1)
datos.rename(columns={0:'dif_tiempo'}, inplace=True)
return datos
请建议一种更有效地提高功能的方法。
非常感谢你。
答案 0 :(得分:1)
In [4]:
df['diff time'] = df['arrivalTime'].diff().fillna(0)
df
Out[4]:
cEventID arrivalTime diff time
0 1167533 NaT 00:00:00
1 1167541 2015-07-14 04:01:21 00:00:00
2 1167545 2015-07-14 04:03:20 00:01:59
3 1167549 2015-07-14 04:07:45 00:04:25
4 1167552 2015-07-14 04:10:21 00:02:36
5 1167553 2015-07-14 04:13:39 00:03:18
6 1167558 2015-07-14 04:15:58 00:02:19
7 1167561 2015-07-14 04:20:23 00:04:25