我有一个带有时间戳索引的数据框(数万个项目)和一个与某些事件对应的时间戳列表。我需要在任何事件发生前n分钟标记数据框中的所有项目,因此我编写了以下代码:
for timestamp in events:
df.loc[timestamp - timespan : timestamp, 'is_before_event'] = True
事实证明它很慢,所以我尝试首先建立一个必须更改的所有元素的索引,然后对所有元素进行单一赋值:
for timestamp in events:
temp_index = temp_index.append(df.loc[timestamp - timespan : timestamp].index)
df.loc[df.index.isin(temp_index), 'is_before_event'] = True
此代码的运行速度至少比我第一次尝试快100倍。
为什么会这样,在这种情况下做出任务的正确方法是什么?
答案 0 :(得分:2)
我认为如果需要loc
和True
值,您可以将布尔掩码分配给不含False
的列。
还必须使用numpy.concatenate
与numpy.unique
一起加入所有索引以删除重复项。
temp_index = []
for timestamp in events:
temp_index.append(df.loc[timestamp - timespan : timestamp].index)
df['is_before_event'] = df.index.isin(np.concatenate(temp_index))
示例(列表理解与上述解决方案相同):
rng = pd.date_range('2017-04-03', periods=20, freq='T')
df = pd.DataFrame({'a': range(20)}, index=rng)
#print (df)
events = pd.to_datetime(['2017-04-03 00:03:00', '2017-04-03 00:09:45'])
t = pd.Timedelta('00:03:00')
temp_index = [df.loc[timestamp - t : timestamp].index for timestamp in events]
idx = np.unique(np.concatenate(temp_index))
df['is_before_event'] = df.index.isin(idx)
print (df)
a is_before_event
2017-04-03 00:00:00 0 True
2017-04-03 00:01:00 1 True
2017-04-03 00:02:00 2 True
2017-04-03 00:03:00 3 True
2017-04-03 00:04:00 4 False
2017-04-03 00:05:00 5 False
2017-04-03 00:06:00 6 False
2017-04-03 00:07:00 7 True
2017-04-03 00:08:00 8 True
2017-04-03 00:09:00 9 True
2017-04-03 00:10:00 10 False
2017-04-03 00:11:00 11 False
2017-04-03 00:12:00 12 False
2017-04-03 00:13:00 13 False
2017-04-03 00:14:00 14 False
2017-04-03 00:15:00 15 False
2017-04-03 00:16:00 16 False
2017-04-03 00:17:00 17 False
2017-04-03 00:18:00 18 False
2017-04-03 00:19:00 19 False
类似的解决方案:
temp_index = [df.loc[timestamp - t : timestamp].index for timestamp in events]
idx = np.unique(np.concatenate(temp_index))
df['is_before_event'] = False
df.loc[idx, 'is_before_event'] = True
print (df)
a is_before_event
2017-04-03 00:00:00 0 True
2017-04-03 00:01:00 1 True
2017-04-03 00:02:00 2 True
2017-04-03 00:03:00 3 True
2017-04-03 00:04:00 4 False
2017-04-03 00:05:00 5 False
2017-04-03 00:06:00 6 False
2017-04-03 00:07:00 7 True
2017-04-03 00:08:00 8 True
2017-04-03 00:09:00 9 True
2017-04-03 00:10:00 10 False
2017-04-03 00:11:00 11 False
2017-04-03 00:12:00 12 False
2017-04-03 00:13:00 13 False
2017-04-03 00:14:00 14 False
2017-04-03 00:15:00 15 False
2017-04-03 00:16:00 16 False
2017-04-03 00:17:00 17 False
2017-04-03 00:18:00 18 False
2017-04-03 00:19:00 19 False