Question

我有一个受电子峰值影响的加速度数据数据集。

我正在寻找一种很好的方法来滤除或减少这些峰值，因为需要在这些数据上计算FFT的滚动窗口和其他统计指标，如峰度和偏度。我不能简单地删除这些异常值或用NaN替换它们。抽样2000 [hz]

到目前为止，我已经尝试过MATLAB 2012b：

小波去噪（Haar小波）
中位数过滤器
Despike and iterpolate方法

你能建议一个正确的方法来处理这些数据吗？

Answer 1

我会建议一些局部平滑。通过定义阈值并平均所有低于和高于的值。

Af = data.example1;
% Thresholds
Tl = -0.6;
To = 0.6;

peaks = find( Af < Tl | Af > To);
Af(peaks) = ( Af(peaks-1) + Af(peaks+1) ) / 2;

这种方法的问题在于，您的大纲有时最多包含6个样本。所以你需要使用while循环平滑多个步骤：

Af = data.example1;
% Thresholds
Tl = -0.6;
To = 0.6;

% initialisation
peaks = find( Af < Tl | Af > To);
counter = 0;

while ~isempty(peaks)
    peaks = find( Af < Tl | Af > To);
    Af(peaks) = ( Af(peaks-1) + Af(peaks+1) ) / 2;
    counter=counter+1;
end

经过6次迭代后，您将得到以下结果： enter image description here

Answer 2

我已经使用matlab中央文件交换中的文件despiking对类似问题有很好的效果，不过我看到你也试过了。

我采取的另一种方法是将峰值视为统计异常值，并使用this function使用Rosner's many outlier test将其删除。（由于显而易见的原因，NIST网站已关闭，因此这里是Google cached版本）

编辑补充：我错了。我的despiking算法不是来自我上面链接的文件交换功能。它实际上已从一篇期刊文章中删除（该代码列在该论文的补充信息中，但它们没有将代码发布到文件交换中）。论文是：

消除噪音的实用方法：应用于尖峰，非平稳准周期噪声和基线漂移

Delphine Feuerstein，Kim H. Parker和Martyn G. Boutelle

Anal. Chem., 2009, 81 (12), pp 4987–4994

由于版权归美国化学学会和作者所有，我无法在此处复制代码，但如果您可以访问大学图书馆帐户，则可以下载副本。如果你不这样做，我把链接留给文件交换版本，但是我还没有使用它，所以我不能保证它的功效。

Answer 3

主持人将此问题与this question合并 - 这就是为什么它在这里看起来有点混乱。这个答案在第二个问题中考虑了其他问题！

以下不是一个完全干净的解决方案，代码是从my previous answer采用的，但我为您的案例添加了一个例外，因此您不需要在您的开头和/或结尾删除值手动数据。它只丢弃这些无效值，不会导致问题。

Af = csvread(strcat('example_data_with_peak.txt'),5,0); 

% Thresholds
Tl = -0.04;
To = 0.04;

% initialisation
peaks = find( Af < Tl | Af > To);
counter = 0;

while ~isempty(peaks)
    peaks = find( Af < Tl | Af > To);
    try
        Af(peaks) = ( Af(peaks-1) + Af(peaks+1) ) / 2;
    catch
        if peaks(1) == 1
            Af(1) = 0;
        else
            Af(end) = 0;
        end
    end   
    counter=counter+1;
end

figure(2);
plot(Af)

enter image description here

为了确定阈值你可以使用这样的一些思考，但它也是非常强大的力量：

thresh = 15*mean(abs(findpeaks(Af)));

Answer 4

对于其他可能需要它的人来说，这就是我最终使用的内容。这是数据文件data file link

感谢@thewaywewalk

Matlab filter electical spikes in accelerometric data

clear all, clc,clf,tic
aa=csvread(strcat('/tmp/example_data_with_peak.txt'),5,0); %will skip the first 5 rows that are text and zeros
figure(1);
plot(aa)
Af=aa;
% Thresholds
Tl = -mean(abs(aa))*10
To =mean(abs(aa))*10

% initialisation
[peaks_r,peaks_c] = find( Af < Tl | Af > To);
peaks = find( Af < Tl | Af > To);

counter = 0;

while ~isempty(peaks)
    peaks = find( Af < Tl | Af > To);
    try
        Af(peaks) = ( Af(peaks-1) + Af(peaks+1) ) / 2;
    catch
        if peaks(1) == 1
            Af(1) = 0;
        else
            Af(end) = 0;
        end
    end   
    counter=counter+1;
end
counter
figure(2);
plot(Af)

以下是之前和之后的图像。

Before and After

Answer 5

我发现，对于单点尖峰的特定问题（当宇宙射线在曝光期间使单个CCD电池放电时，这在CCD检测器中会出现），以下算法效果很好：

  N=length(y);
  for i=[3:1:N-2]
    # calculate the means of two nearest neighbours, and two next-nearest neighbours
    y1=(y(i-1)+y(i+1))/2;
    y2=(y(i-2)+y(i+2))/2;
    # if those two means are close, but the current point is far off, it's a spike
    if ( abs(y2-y(i)) > cutoff && abs(y1-y2) < cutoff)
       z(i)=y2;
    endif
  endfor

选择最佳策略以进行良好的选择是一个单独的问题；我倾向于根据CCD中的典型暗计数将其设置为固定值。人们还可以对“接近”和“遥远”使用不同的级别，例如：

    if ( abs(y2-y(i)) > cutoff_far && abs(y1-y2) < cutoff_close )

还可以选择其他条件，例如两个均值之差比尖峰数据的差小X倍：

    if ( abs(y2-y(i)) > 10*abs(y1-y2) )

比单点尖峰宽的峰在此过程中不受干扰。

An example of de-spiked Raman spectrum using a CCD detector

Matlab过滤加速度计数据中的电子尖峰

5 个答案: