我有一个看起来像这样的数据框
Timestamp Speed
2014-10-10 00:10:10 112
2014-10-10 00:10:13 34
2014-10-10 00:10:17 0
2014-10-10 00:10:20 0
2014-10-10 00:10:45 0
2014-10-10 00:10:56 3
2014-10-10 00:11:06 0
2014-10-10 00:11:09 0
2014-10-10 00:11:14 11
我想按连续值分组(在这种情况下为0)并且输出类似
start_time end_time number
2014-10-10 00:10:17 2014-10-10 00:10:45 3
2014-10-10 00:11:06 2014-10-10 00:11:09 2
答案 0 :(得分:1)
您可以使用.groupby()
检查相邻值是否发生变化(即df["Speed"] != df["Speed"].shift()
),然后检查每个块中的速度是否为0。可能有更好的方法来重新组合最终DataFrame
,但我只是将结果放入列表并在最后重新组装。
你的表格没有很好地与pd.read_clipboard()
一起阅读,所以我只有时间,但它应该与你的真实数据一样。
In [113]: df
Out[113]:
Speed
Timestamp
00:10:10 112
00:10:13 34
00:10:17 0
00:10:20 0
00:10:45 0
00:10:56 3
00:11:06 0
00:11:09 0
00:11:14 11
In [114]: l = []
In [115]: for k, v in df.groupby((df["Speed"] != df["Speed"].shift()).cumsum()):
...: if v["Speed"].iloc[0] == 0:
...: l.append({'start_time': v.index.min(), 'end_time': v.index.max(), 'number': len(v)})
...: pd.DataFrame(l, columns=['start_time', 'end_time', 'number'])
...:
Out[115]:
start_time end_time number
0 00:10:17 00:10:45 3
1 00:11:06 00:11:09 2
答案 1 :(得分:0)
这是一个非循环实现
s = (((df['speed'] == 0) & (df['speed'].shift(1) == 0)) | ((df['speed'] == 0) & (df['speed'].shift(-1) == 0)) ) * 1
s1 = s.diff()
group_labels = s1[s1 == 1].cumsum()
s_nan = s.replace(1, np.nan)
df_copy = df.copy()
df_copy['label'] = s_nan.combine_first(group_labels).fillna(method='ffill').replace(0, np.nan)
df_copy = df_copy.groupby('label')['timestamp'].agg({'start_time':'first', 'end_time':'last', 'number':'size'})
df_copy = df_copy[['start_time', 'end_time', 'number']].reset_index(drop=True)
df_copy
start_time end_time number
0 2014-10-10 00:10:17 2014-10-10 00:10:45 3
1 2014-10-10 00:11:06 2014-10-10 00:11:09 2