问题

Question

此类数据的测试代码：

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

x = np.linspace(0,1,20)
y = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0])

n = np.size(x)
mean = sum(x*y)/n
sigma = np.sqrt(sum(y*(x-mean)**2)/n)

def gaus(x,a,x0,sigma):
    return a*np.exp(-(x-x0)**2/(2*sigma**2))

popt,pcov = curve_fit(gaus,x,y,p0=[max(y),mean,sigma])

plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()

我需要像上面给出的y数组一样适合高斯分布的大量数据。

使用使用scipy.optimize的标准高斯拟合例程进行这种拟合：

我尝试了许多不同的初始值，但无法获得任何合适的结果。

有人知道如何将这些数据拟合到高斯吗？

谢谢

Answer 1

请勿使用常规的“ a”参数，而应使用正确的normal distribution equation：

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

x = np.linspace(0,1,20)
y = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0])

n = np.size(x)
mean = sum(x*y)/n
sigma = np.sqrt(sum(y*(x-mean)**2)/n)

def gaus(x, x0, sigma):
    return 1/np.sqrt(2 * np.pi * sigma**2)*np.exp(-(x-x0)**2/(2*sigma**2))

popt,pcov = curve_fit(gaus,x,y,p0=[mean,sigma])

plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()

Answer 2

问题

您的根本问题是，您有一个严重不确定的装配问题。这样考虑：您有三个未知数，但只有一个数据点。这类似于在只有一个方程式的情况下求解x, y, z的情况。由于高斯的高度可以随宽度的变化而变化，因此存在无限多种分布，所有分布都具有不同的宽度，可以满足您的拟合约束。

更直接地，您的a和sigma参数都可以更改分布的最大高度，这对于获得良好的拟合度（至少一次完成分布居中且相当狭窄）。因此，Scipy中的拟合例程无法确定在任何给定步骤中要更改的例程。

修复

解决问题的最简单方法是锁定您的参数之一。您无需更改方程式，但需要使a，x0或sigma中的至少一个成为常数。要修复的参数的最佳选择可能是x0，因为仅通过获取y中非零的一个数据点的x坐标来确定数据的均值/中位数/众数很简单。您还需要更加聪明地设置初始猜测。看起来像这样：

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

x = np.linspace(0,1,20)
xdiff = x[1] - x[0]
y = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0])

# the mean/median/mode all occur at the x coordinate of the one datapoint that is non-zero in y
mean = x[np.argmax(y)]
# sigma should be tiny, since we want a narrow distribution
sigma = xdiff
# the scaling factor should be roughly equal to the "height" of the one datapoint
a = y.max()

def gaus(x,a,sigma):
    return a*np.exp(-(x-mean)**2/(2*sigma**2))

bounds = ((1, .015), (20, 1))
popt,pcov = curve_fit(gaus, x, y, p0=[a, sigma], maxfev=20000, bounds=bounds)
residual = ((gaus(x,*popt) - y)**2).sum()

plt.figure(figsize=(8,6))

plt.plot(x,y,'b+:',label='data')

xdist = np.linspace(x.min(), x.max(), 1000)
plt.plot(xdist,gaus(xdist,*popt),'C0', label='fit distribution')

plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.text(.1,6,"residual: %.6e" % residual)

plt.legend()
plt.show()

输出：

更好的解决方法

您不需要适合的东西就可以得到想要的高斯。您可以改为使用简单的封闭式表达式来计算所需的参数，如以下代码中的fitonegauss函数所示：

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def gauss(x, a, mean, sigma):
    return a*np.exp(-(x - mean)**2/(2*sigma**2))

def fitonegauss(x, y, fwhm=None):
    if fwhm is None:
        # determine full width at half maximum from the spacing between the x points
        fwhm = (x[1] - x[0])

    # the mean/median/mode all occur at the x coordinate of the one datapoint that is non-zero in y
    mean = x[np.argmax(y)]

    # solve for sigma in terms of the desired full width at half maximum
    sigma = fwhm/(2*np.sqrt(2*np.log(2)))

    # max(pdf) == 1/(np.sqrt(2*np.pi)*sigma). Use that to determine a
    a = y.max() #(np.sqrt(2*np.pi)*sigma)

    return a, mean, sigma

N = 20
x = np.linspace(0,1,N)
y = np.zeros(N)
y[N//2] = 10

popt = fitonegauss(x, y)

plt.figure(figsize=(8,6))
plt.plot(x,y,'b+:',label='data')

xdist = np.linspace(x.min(), x.max(), 1000)
plt.plot(xdist,gauss(xdist,*popt),'C0', label='fit distribution')

residual = ((gauss(x,*popt) - y)**2).sum()
plt.plot(x, gauss(x,*popt),'ro:',label='fit')
plt.text(.1,6,"residual: %.6e" % residual)

plt.legend()
plt.show()

输出：

这种方法的优点很多。它的计算效率比任何拟合都高得多，它在大多数情况下都不会失败，它使您可以更好地控制最终分布的实际宽度。

已设置fitonegauss功能，以便您可以直接设置拟合分布的full width at half maximum。如果不设置它，代码将根据x数据的间隔自动猜测它。这似乎为您的应用程序产生了合理的结果。

将高斯拟合为除中心点处的尖峰外所有地方都为零的数据

2 个答案:

问题

修复

更好的解决方法