Question

我这里的代码是从具有相同点数的两个高斯分布中提取的。

最终，我想模拟噪音，但我想知道为什么如果我有两个高斯彼此相距很远的手段，我的curve_fit应该返回他们的平均值。它没有这样做。

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import gauss

N_tot = 1000
# Draw from the major gaussian. Note the number N. It is
# the main parameter in obtaining your estimators.
mean = 0; sigma = 1; var = sigma**2; N = 100
A = 1/np.sqrt((2*np.pi*var))
points = gauss.draw_1dGauss(mean,var,N)

# Now draw from a minor gaussian. Note Np
meanp = 10; sigmap = 1; varp = sigmap**2; Np = N_tot-N
pointsp = gauss.draw_1dGauss(meanp,varp,Np)
Ap = 1/np.sqrt((2*np.pi*varp))      

# Now implement the sum of the draws by concatenating the two arrays.
points_tot = np.array(points.tolist()+pointsp.tolist())
bins_tot = len(points_tot)/5
hist_tot, bin_edges_tot = np.histogram(points_tot,bins_tot,density=True)
bin_centres_tot = (bin_edges_tot[:-1] + bin_edges_tot[1:])/2.0

# Initial guess
p0 = [A, mean, sigma]

# Result of the fit
coeff, var_matrix = curve_fit(gauss.gaussFun, bin_centres_tot, hist_tot, p0=p0)

# Get the fitted curve
hist_fit = gauss.gaussFun(bin_centres, *coeff)
plt.figure(5); plt.title('Gaussian Estimate')
plt.suptitle('Gaussian Parameters: Mu = '+ str(coeff[1]) +' , Sigma = ' + str(coeff[2]) + ', Amplitude = ' + str(coeff[0]))
plt.plot(bin_centres,hist_fit)
plt.draw()        

# Error on the estimates
error_parameters = np.sqrt(np.array([var_matrix[0][0],var_matrix[1][1],var_matrix[2][2]]))

返回的参数仍然以0为中心，我不知道为什么。它应该以10为中心。

编辑：更改整数除法部分，但仍然没有返回合适的值。我应该得到约10的平均值，因为我的大多数要点都来自该分布（即次要分布）

Answer 1

您发现最小二乘优化会收敛到两个峰中较大的一个。

最小二乘最优不能找到＆＃34;平均值＆＃34;在两个分量分布中，它只是最小化平方误差。这通常发生在最大峰值适合时。

当分布不均匀时（90％的样本来自两个峰中较大的一个），主峰上的误差项破坏了较小峰处的局部最小值和峰之间的最小值。

只有当峰值大小几乎相等时，才能使拟合收敛到中心点，否则你应该期望最小二乘法找到最强的＆＃34;如果它没有陷入局部最小值，则达到峰值。

通过以下部分，我可以运行您的代码：

bin_centres = bin_centres_tot

def draw_1dGauss(mean,var,N):
    from scipy.stats import norm
    from numpy import sqrt
    return scipy.stats.norm.rvs(loc = mean, scale = sqrt(var), size=N)

def gaussFun(bin_centres, *coeff):
    from numpy import sqrt, exp, pi
    A, mean, sigma = coeff[0], coeff[1], coeff[2]
    return exp(-(bin_centres-mean)**2 / 2. / sigma**2 ) / sigma / sqrt(2*pi)

plt.hist(points_tot, normed=True, bins=40)

Curve_Fit不返回预期值

1 个答案: