Question

我有1D数据代表一些强度值。我想检测这些数据中的组件数量（具有相似强度的点簇，或者从该数据创建的直方图中的“峰值”数量）。

这种方法：1D multiple peak detection?对我来说不是很有用，因为一个“峰值”可以包含更多的局部最大值（见下图）。

enter image description here

原因，我可以使用统计方法，例如，我可以尝试拟合1,2,3，...... n峰的数据，然后计算BIC，AIC或其他适合每个人。最后使用elbow method确定群集数量。但是，我想尽可能快地检测出近似的峰值数量，并且拟合高斯混合物是非常耗时的过程。

我的方法

所以我提出了以下方法（在C ++中）。它需要直方图箱高度（y）并搜索y值开始下降的指数。然后过滤低于y容差（yt）的值。最后，使用x tolerance（xt）接近其他索引也会被过滤：

Indices StatUtils::findLocalMaximas(const Points1D &y, int xt, int yt) {

  // Result indices
  Indices indices;

  // Find all local maximas
  int imax = 0;
  double max = y[0];
  bool inc = true;
  bool dec = false;
  for (int i = 1; i < y.size(); i++) {    

    // Changed from decline to increase, reset maximum
    if (dec && y[i - 1] < y[i]) {
      max = std::numeric_limits<double>::min();
      dec = false;
      inc = true;
    }

    // Changed from increase to decline, save index of maximum
    if (inc && y[i - 1] > y[i]) {
       indices.append(imax);
       dec = true;
       inc = false;
    }

    // Update maximum
    if (y[i] > max) {
       max = y[i];
       imax = i;
    }
  }

  // If peak size is too small, ignore it
  int i = 0;
  while (indices.count() >= 1 && i < indices.count()) {
    if (y[indices.at(i)] < yt) {
      indices.removeAt(i);
    } else {
      i++;
    }
  }

  // If two peaks are near to each other, take only the largest one
  i = 1;
  while (indices.count() >= 2 && i < indices.count()) {
    int index1 = indices.at(i - 1);
    int index2 = indices.at(i);
    if (abs(index1 - index2) < xt) {
      indices.removeAt(y[index1] < y[index2] ? i-1 : i);
    } else {
      i++;
    }
  }
  return indices;
}

方法问题

此解决方案的问题在于强烈依赖于那些公差值（xt和yt）。因此，我必须获得有关峰值之间允许的最小距离的信息。此外，我的数据中有一些孤立的异常值高于那些较小峰值的最大值。

您能否提出一些其他方法来确定数据的峰值数量，与附图中的数据类似。

Answer 1

您可以使用我的approximate Gaussian mixtures方法：

这是一种强大的统计方法
它不依赖于绝对阈值;它只有两个参数，即相对（标准化）数量，易于控制，相同的值适用于不同的数据集
与肘法和大多数统计方法不同，它在单个EM（期望最大化）运行中动态估计模式的数量。它从每个数据点开始作为独立模式，并在每次迭代时删除“重叠”模式。
它很快，因为它在每次迭代时使用近似最近邻（ANN）搜索，并且其更新仅考虑k个最近邻居，而不是所有数据点。

有一个在线Matlab demo，因此您可以轻松地在小型数据集上进行试验。在我们的C ++实现中，我们使用FLANN进行大规模的最近邻搜索。不幸的是，这个实现并不公开，但如果你有兴趣，我可以给你一些版本。

直方图中的峰数

1 个答案: