Question

我如何摆脱离散数据集中的闪亮数据，但是以“更顺畅”的方式？

以实例

为例

enter image description here

有两个火花，在20000，但下一个在600也被认为是一个火花。

我设法通过

将非常高的值设为零

a = 2
b = 5
beta_dist = RealDistribution('beta', [a, b])
f(x) = x / 19968
normalized_insertions = [f(i) for i in insertions]

insertions_pairs = [(i, beta_dist.distribution_function(i)) for i in normalized_insertions]
plot_b = beta_dist.plot()

show(list_plot(insertions_pairs)+plot_b)

不知道怎么去下层。应该达到100的最大值，也许β分布的参数需要更多的麻烦？

目前，它看起来像这样： enter image description here

如果可能，请使用sage作为解释的参考。

Answer 1

你可能应该看一下卡尔曼滤波器。这将确定您的数据的偏差并平滑高斯平均值。因此，20k的数字几乎没有任何影响，而600的数字将产生更大的影响，它们仍然会被数据的一致性大大超过。如果你喜欢数学：
http://www.cs.berkeley.edu/~pabbeel/cs287-fa11/slides/Smoother_KalmanSmoother--DRAFT.pdf
否则可能：
http://interactive-matter.eu/blog/2009/12/18/filtering-sensor-data-with-a-kalman-filter/

摆脱样本数据的尖峰

1 个答案: