基于列条件的大熊猫

时间:2020-05-23 17:24:10

标签: python pandas numpy

我有一个记录汽车行驶速度的数据框。 “ id”是其汽车ID。数据框如下所示:

df = pd.DataFrame({'id':[1,1,1,1,1,1,1,1,1,1],
                   'speed':[10,0,0,20,20,15,0,0,0,10],
                   'time':['2020-01-17 18:43:29',
                             '2020-01-17 18:43:48',
                             '2020-01-17 18:44:09',
                             '2020-01-17 18:44:28',
                             '2020-01-17 18:44:48',
                             '2020-01-17 18:46:05',
                             '2020-01-17 18:47:15',
                             '2020-01-17 18:47:24',
                             '2020-01-17 18:53:07',
                             '2020-01-17 18:58:36']})
df['time']=pd.to_datetime(df['time'])

我想估计停止时间(速度= 0)。所以我首先这样做:

df['time_diff']=(df['time'].shift(-1)-df['time']).dt.seconds

现在,当'speed = 0'时,我想累加列'time_diff'。结果应如下所示:

[0, 40, 40, 0, 0, 0, 681, 681, 681, 0]

此问题的关键思想是,我们需要累加以获得连续的“速度= 0”。我确实检查了一些类似的答案,但是找不到一个好的解决方案。

1 个答案:

答案 0 :(得分:2)

IIUC,请尝试:

InvocationTargetException / NullPointerException

c = df['speed'].eq(0) #condition
#calculation as per your question
s = (df['time'].shift(-1)-df['time']).dt.seconds
#check if series is immediate duplicate and groupby and sum 
#then replace with 0 where c isn't met
s.groupby((c.ne(c.shift()).cumsum())).transform('sum').where(c,0)#.astype(int).tolist()