Question

是否有一种简单有效的方法可以在数据系列中查找特定事件？对于事件，我指的是数据中的特定条件，例如尖峰，超过/低于阈值或跨越数据系列等。

我基本上有两个目标： 1）将事件周围的数据与下一个/上一个事件周围的数据进行比较，以分析它们的比较和调整对事件的影响。 2）将数据中所有事件的关键数据复制到新的数据框中进行统计分析。

在我看来，我想循环遍历事件并获取事件的索引值，以便我可以处理它周围的数据。

显然我可以选择循环遍历所有数据，但我怀疑它们应该是一种更有效的方法。关于如何最好地解决这个问题的任何指示？

Answer 1

我会做如下的事情：

# Lets use numpy (you can do the same with pandas or any other algebra package
import numpy as np

# Just generate some data for the example
data = np.array([1,2,3,3,2,1]) 

# Lets say we are looking for a period that data is greater than 2.
# First, we indicate all those points
indicators = (data > 2).astype(int) # now we have [0 0 1 1 0 0]

# We differentiate that so we will have non-zero wherever data > 2.
# Note that we concatenate 0 at the beginning.
indicators_diff = np.concatenate([[0],indicators[1:] - indicators[:-1]])

# Now lets seek for those indices
diff_locations = np.where(indicators_diff != 0)[0]

# We are resulting in all places that the derivative is non-zero.
# Those are indices of start and end of events:
# [event1_start, event1_end, event2_start, ....]
# So we choose by filtering odd/even places of the resulted vector
events_starts_list = diff_locations[::2].tolist()
events_ends_list = diff_locations[1::2].tolist()

# And now we can also gather the events data by iterating the events.
event_data_list = []

for event_start, event_end in zip(events_starts_list, events_ends_list):
     event_data_list.append(data[event_start:event_end])

由于此代码使用用C编写的numpy后端来运行大多数循环，因此它运行速度非常快。我一直用它来快速解决问题。

祝你好运！

修改：为了清晰起见，添加了一些评论注意：您可能还希望处理特殊情况，例如最终事件是否在数据末尾。可能会出现diff_locations变量中包含奇数个元素的情况。如果它是奇数，只需决定一个索引（例如最后一个）并将其添加到此列表中，然后再分为events_starts_list和events_ends_list。

Python：如何在数据中查找事件发生？

1 个答案: