让我们说我有一个随机分布的数据,看起来像:
我想用固定宽度的高斯替换每个数据点y [x_i] 并将它们添加在一起。它应该给我:
我的代码非常原始且缓慢:
def gaussian(x, mu, sig):
return 1/(sig*np.sqrt(2*np.pi))*np.exp(-np.power(x - mu, 2.) / (
2 * np.power(sig, 2.)))
def gaussian_smoothing(x, y, sig=0.5, n=1000):
x_new = np.linspace(x.min()-10*sig, x.max()+10*sig, n)
y_new = np.zeros(x_new.shape)
for _x, _y in zip(x, y):
y_new += _y*gaussian(x_new, _x, sig)
return x_new, y_new
对于大型数据集,执行这种平滑处理需要很长时间。
我当时在看np.convolve
。但是,它暗示它仅适用于均匀分布的数据,并且x步长对于数据和高斯应该是相同的。最快的方法是执行这种操作。
答案 0 :(得分:0)
除了尝试使用sklearn
将其估计为具有较少数量成分(例如EM算法)的高斯混合体:
import matplotlib.pyplot as plt
from numpy.random import choice
from sklearn import mixture
import scipy.stats
import numpy
# generate some data
x = numpy.array([1.,1.1,1.6,2.,2.1,2.2,2.9,3.,8.,62.,62.2,63.,63.4,64.5,65.,67.,69.])
# generate weights to it
y = numpy.random.rand(x.shape[0])
# normalize weigth to 1
y /= y.sum()
# resamlple to 5000 samples with equal weights according to original weights
x_rsmp = numpy.array([choice(x, p=y) for _ in range(5000)])
x_rsmp.sort()
x_rsmp = x_rsmp.reshape(-1,1)
# define number of components - this must be user seelcted or estimated
n_comp = 2
# fit the mixture
gmm = mixture.GaussianMixture(n_components=n_comp, covariance_type='full')
gmm.fit(x_rsmp)
# plot it
fig = plt.figure()
ax = fig.add_subplot(111)
x_gauss = numpy.linspace(-10,100,1000)
for n_c in range(n_comp):
norm_pdf = scipy.stats.norm.pdf(x_gauss, gmm.means_[n_c,0], gmm.covariances_[n_c,0])
ax.plot(x_gauss, norm_pdf, label='gauss %d' % (n_c+1))
ax.stem(x,y,'gray')
plt.legend()
它产生n_c
个高斯分量,均值为gmm.means_
,协方差为gmm.covariances_
。