Question

我有一个记录汽车行驶速度的数据框。 “ id”是其汽车ID。数据框如下所示：

df = pd.DataFrame({'id':[1,1,1,1,1,1,1,1,1,1],
                   'speed':[10,0,0,20,20,15,0,0,0,10],
                   'time':['2020-01-17 18:43:29',
                             '2020-01-17 18:43:48',
                             '2020-01-17 18:44:09',
                             '2020-01-17 18:44:28',
                             '2020-01-17 18:44:48',
                             '2020-01-17 18:46:05',
                             '2020-01-17 18:47:15',
                             '2020-01-17 18:47:24',
                             '2020-01-17 18:53:07',
                             '2020-01-17 18:58:36']})
df['time']=pd.to_datetime(df['time'])

我想估计停止时间（速度= 0）。所以我首先这样做：

df['time_diff']=(df['time'].shift(-1)-df['time']).dt.seconds

现在，当'speed = 0'时，我想累加列'time_diff'。结果应如下所示：

[0, 40, 40, 0, 0, 0, 681, 681, 681, 0]

此问题的关键思想是，我们需要累加以获得连续的“速度= 0”。我确实检查了一些类似的答案，但是找不到一个好的解决方案。

Answer 1

IIUC，请尝试：

InvocationTargetException / NullPointerException

c = df['speed'].eq(0) #condition
#calculation as per your question
s = (df['time'].shift(-1)-df['time']).dt.seconds
#check if series is immediate duplicate and groupby and sum 
#then replace with 0 where c isn't met
s.groupby((c.ne(c.shift()).cumsum())).transform('sum').where(c,0)#.astype(int).tolist()

基于列条件的大熊猫

1 个答案: