Question

假设我有一个以numpy数组表示的时间序列，每3秒钟，我得到一个数据点。它看起来像这样（但有更多的数据点）：

z = np.array([1, 2, 1, 2.2, 3, 4.4, 1, 1.2, 2, 3, 2.1, 1.2, 5, 0.5])

我希望找到一个阈值，平均每隔y秒数据点就会超过该阈值（x）。

也许我的问题在这个意义上更容易理解：让我们说我已经收集了一些数据，说明有多少蚂蚁每隔3秒离开他们的土墩。使用这些数据，我想创建一个阈值（x），以便将来如果一次离开的蚂蚁数量超过x，我的蜂鸣器将会消失。现在这是关键部分 - 我希望我的蜂鸣器大约每4秒钟一次。我想根据我已收集的一系列数据，使用Python来确定x应该花费y的时间。

有没有办法在Python中执行此操作？

Answer 1

我认为在统计数据方面首先考虑这一点是最容易的。我认为你真正想说的是你要计算100*(1-m/nth)百分位数，即数值低于1-m/nth的时间，其中m是你的抽样句点和n是您想要的间隔。在您的示例中，它将是100*(1-3/4th)百分位数或25th百分位数。也就是说，您希望超过75%时间的值。

因此，要计算数据，请使用scipy.stats.scoreatpercentile。因此，对于您的情况，您可以执行以下操作：

>>> z = np.array([1, 2, 1, 2.2, 3, 4.4, 1, 1.2, 2, 3, 2.1, 1.2, 5, 0.5])
>>> m = 3.
>>> n = 4.
>>> x = scipy.stats.scoreatpercentile(z, 100*(1-m/n))
>>> print(x)
1.05
>>> print((z>x).sum()/len(z))  # test, should be about 0.75
0.714285714286

当然，如果你有很多价值，这个估计会更好。

编辑：最初我的百分位向后。它应该是1-m/n，但我最初只有m/n。

Answer 2

假设触发器的一秒分辨率正常......

import numpy as np
z = np.array([1, 2, 1, 2.2, 3, 4.4, 1, 1.2, 2, 3, 2.1, 1.2, 5, 0.5])
period = 3

将每个采样点除以其周期（以秒为单位）并创建一个一秒数据的数组 - 假设每个采样的线性分布（？）。

y = np.array([[n]*period for n in z / period])
y = y.flatten()

将数据重塑为四个第二期（有损）

h = len(y) % 4
x = y[:-h]
w = x.reshape((4, len(x) / 4))

找出每四秒周期的总和，找出这些间隔的最小值

v = w.sum(axis = -1)
# use the min value of these sums
threshold = v.min() # 2.1

这为非重叠的四秒块提供了一个总阈值 - 但它只为z生成6个触发器，代表42秒的数据。

使用重叠的滚动窗口查找每个四秒窗口总和的最小值（无损）

def rolling(a, window, step = 1):
    """

    Examples
    --------
    >>> a = np.arange(10)
    >>> print rolling(a, 3)
    [[0 1 2]
     [1 2 3]
     [2 3 4]
     [3 4 5]
     [4 5 6]
     [5 6 7]
     [6 7 8]
     [7 8 9]]
    >>> print rolling(a, 4)
    [[0 1 2 3]
     [1 2 3 4]
     [2 3 4 5]
     [3 4 5 6]
     [4 5 6 7]
     [5 6 7 8]
     [6 7 8 9]]
    >>> print rolling(a, 4, 2)
    [[0 1 2 3]
     [2 3 4 5]
     [4 5 6 7]
     [6 7 8 9]]
    >>>

    from http://stackoverflow.com/a/12498122/2823755
    """
    shape = ( (a.size-window)/step + 1   , window)
    strides = (a.itemsize*step, a.itemsize)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

t = rolling(y, 4)
s = t.sum(axis = -1)
threshold = s.min() # 1.3999999

这将为z生成8个触发器。

在Python中获取数组中数字的时间频率？

2 个答案: