Question

我正在寻找一种有效的方法来检测原本非常嘈杂的数据中的平稳期。高原总是相对较宽的，此数据看起来像一个简单的例子：

int main() {
  MyClass newClass;
}

请注意，可能存在多个平台期（应该全部检测到），这些平台期可以具有不同的值。

我尝试使用scipy.signal.argrelextrema，但似乎并没有按照我的意愿进行操作：

public class 2DArrays{  

    public static void main(String[] args) {
        int arr[][] = {
                { 1, 2, 3, 4},
                { 1, 2, 3, 4},
                { 1, 2, 3, 4},
                { 1, 2, 3, 4},
                };
        int sumRow1 = 0;
        int sumRow2 = 0;
        int sumRow3 = 0;
        int sumRow4 = 0;

        for(int i = 0; i < arr.length; i++) {
            sumRow1 += arr[i][0];
            sumRow2 += arr[i][1];
            sumRow3 += arr[i][2];
            sumRow4 += arr[i][3];
        }
        System.out.print(sumRow1);
        System.out.print(", " +sumRow2);
        System.out.print(", " +sumRow3);
        System.out.println(", " +sumRow4);
    }
}

我不需要高原的确切间隔，只要该范围大于或等于实际的高原范围，就可以估算出大致的范围。但是，它应该相对有效。

Answer 1

有一种方法scipy.signal.find_peaks您可以尝试，这里有个例子

import numpy
from scipy.signal import find_peaks

test = numpy.random.uniform(0.9, 1.0, 100)
test[10 : 20] = 0
peaks, peak_plateaus = find_peaks(- test, plateau_size = 1)

尽管find_peaks仅查找峰，但如果数组取反，它可以用于查找谷，那么您可以执行以下操作

for i in range(len(peak_plateaus['plateau_sizes'])):
    if peak_plateaus['plateau_sizes'][i] > 1:
        print('a plateau of size %d is found' % peak_plateaus['plateau_sizes'][i])
        print('its left index is %d and right index is %d' % (peak_plateaus['left_edges'][i], peak_plateaus['right_edges'][i]))

它将打印

a plateau of size 10 is found
its left index is 10 and right index is 19

Answer 2

这实际上只是一项“愚蠢”的机器学习任务。您将需要编写一个自定义函数来对其进行筛选。高原有两个关键特征：

它们是相同值（或非常接近）的连续出现。
第一个点和最后一个点分别与向前和向后移动平均线强烈偏离。（如果您希望产生附加噪声，请尝试根据标准偏差对此进行量化，对于几何噪声，您还必须考虑信号的幅度。）

然后，简单的循环应足以计算向前移动平均值，该向前移动平均值中的点的stdev，反向移动平均值和该反向移动平均值中的点的stdev。

阅读，直到找到远远超出常规噪声的点（与方差比较）。开始将这些索引缓冲到列表中。
将读取和缓冲索引保留在该列表中，同时它们具有相同的值（或几乎相同，如果您的高原可能有点粗糙；您将要使用一些公差加上高原的标准偏差，或者只是如果您希望它们的行为都相似，则可以容忍一些）。
如果缓冲区中点的方差太大，则说明它不是平稳的，太粗糙的；扔掉，然后从当前位置重新开始扫描。
如果最后一个值与前一个值（按触发您的代码开始缓冲索引的更改顺序）有很大不同，并且与原始脉冲的方向相反，请在此处限制缓冲区的大小；你在那里有一个高原。
现在就可以使用这些索引处的点进行任何操作。删除它们，然后用两个边界点之间的线性插值代替它们。

我可能会产生一些噪音并为您提供一些示例代码，但这确实是您必须适应应用程序所需要的。（例如，此方法有一个缺点，即在“悬崖边缘”中间捕获一个点的高原在删除其余高原时可能会离开该点。如果您担心的是，您会在您确定高原之后，您将不得不做更多的探索。）您应该能够通过一次传递数据来进行此操作，但是明智的做法是首先获取整个集合的一些统计信息，以智能地调整阈值。

如果您对高原的构成有一个精确定义，则可以使它不那么动手且看起来不像ML，但是只要您要尝试识别模糊模式，您将不得不采用基于统计的方法。

Answer 3

我遇到了类似的问题，并在下面找到了一个简单的启发式解决方案。我发现平台是信号恒定梯度的范围。您可以更改代码以检查梯度是否（接近）0。

我应用移动平均 (uniform_filter_1d) 来滤除噪音。另外，我用数字计算了信号的一阶和二阶导数，所以我不确定它是否符合效率要求。但它对我的信号非常有效，对其他人来说可能是一个很好的起点。

def find_plateaus(F, min_length=200, tolerance = 0.75, smoothing=25):
    '''
    Finds plateaus of signal using second derivative of F.

    Parameters
    ----------
    F : Signal.
    min_length: Minimum length of plateau.
    tolerance: Number between 0 and 1 indicating how tolerant
        the requirement of constant slope of the plateau is.
    smoothing: Size of uniform filter 1D applied to F and its derivatives.
    
    Returns
    -------
    plateaus: array of plateau left and right edges pairs
    dF: (smoothed) derivative of F
    d2F: (smoothed) Second Derivative of F
    '''
    import numpy as np
    from scipy.ndimage.filters import uniform_filter1d
    
    # calculate smooth gradients
    smoothF = uniform_filter1d(F, size = smoothing)
    dF = uniform_filter1d(np.gradient(smoothF),size = smoothing)
    d2F = uniform_filter1d(np.gradient(dF),size = smoothing)
    
    def zero_runs(x):
        '''
        Helper function for finding sequences of 0s in a signal
        https://stackoverflow.com/questions/24885092/finding-the-consecutive-zeros-in-a-numpy-array/24892274#24892274
        '''
        iszero = np.concatenate(([0], np.equal(x, 0).view(np.int8), [0]))
        absdiff = np.abs(np.diff(iszero))
        ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
        return ranges
    
    # Find ranges where second derivative is zero
    # Values under eps are assumed to be zero.
    eps = np.quantile(abs(d2F),tolerance) 
    smalld2F = (abs(d2F) <= eps)
    
    # Find repititions in the mask "smalld2F" (i.e. ranges where d2F is constantly zero)
    p = zero_runs(np.diff(smalld2F))
    
    # np.diff(p) gives the length of each range found.
    # only accept plateaus of min_length
    plateaus = p[(np.diff(p) > min_length).flatten()]
    
    return (plateaus, dF, d2F)

在Numpy数组中找到高原

3 个答案: