Question

我尝试使用scipy.optimize.curve_fit在其中插入包含一些数据的直方图。如果我想在y中添加错误，我可以通过将weight应用于拟合来实现。但是如何在x中应用错误（即直方图中由于分箱引起的错误）？

我的问题也适用于使用x或curve_fit进行线性回归时polyfit中的错误;我知道如何在y中添加错误，但不在x中添加错误。

这是一个例子（部分来自matplotlib documentation）：

import numpy as np
import pylab as P
from scipy.optimize import curve_fit

# create the data histogram
mu, sigma = 200, 25
x = mu + sigma*P.randn(10000)

# define fit function
def gauss(x, *p):
    A, mu, sigma = p
    return A*np.exp(-(x-mu)**2/(2*sigma**2))

# the histogram of the data
n, bins, patches = P.hist(x, 50, histtype='step')
sigma_n = np.sqrt(n)  # Adding Poisson errors in y
bin_centres = (bins[:-1] + bins[1:])/2
sigma_x = (bins[1] - bins[0])/np.sqrt(12)  # Binning error in x
P.setp(patches, 'facecolor', 'g', 'alpha', 0.75)

# fitting and plotting
p0 = [700, 200, 25]
popt, pcov = curve_fit(gauss, bin_centres, n, p0=p0, sigma=sigma_n, absolute_sigma=True)
x = np.arange(100, 300, 0.5)
fit = gauss(x, *popt)
P.plot(x, fit, 'r--')

现在，这个适合（当它没有失败时）确实考虑了y错误sigma_n，但我还没有找到一种方法让它考虑sigma_x。我在scipy邮件列表上扫描了几个线程，了解了如何使用absolute_sigma值和Stackoverflow上关于asymmetrical errors的帖子，但没有关于两个方向的错误。有可能实现吗？

Answer 1

scipy.optmize.curve_fit使用标准的非线性最小二乘优化，因此只会最小化响应变量的偏差。如果您想要考虑自变量中的错误，可以尝试使用正交距离回归的scipy.odr。顾名思义，它最大限度地减少了独立变量和因变量。

看看下面的示例。 fit_type参数确定scipy.odr是执行完整ODR（fit_type=0）还是最小二乘优化（fit_type=2）。

修改

虽然这个例子很有效，但是没有多大意义，因为y数据是在噪声x数据上计算出来的，这只会导致一个不等间距的独立变量。我更新了样本，现在还展示了如何使用RealData，它允许指定数据的标准错误而不是权重。

from scipy.odr import ODR, Model, Data, RealData import numpy as np from pylab import * def func(beta, x): y = beta[0]+beta[1]*x+beta[2]*x**3 return y #generate data x = np.linspace(-3,2,100) y = func([-2.3,7.0,-4.0], x) # add some noise x += np.random.normal(scale=0.3, size=100) y += np.random.normal(scale=0.1, size=100) data = RealData(x, y, 0.3, 0.1) model = Model(func) odr = ODR(data, model, [1,0,0]) odr.set_job(fit_type=2) output = odr.run() xn = np.linspace(-3,2,50) yn = func(output.beta, xn) hold(True) plot(x,y,'ro') plot(xn,yn,'k-',label='leastsq') odr.set_job(fit_type=0) output = odr.run() yn = func(output.beta, xn) plot(xn,yn,'g-',label='odr') legend(loc=0)

使用scipy curve_fit进行正确拟合，包括x中的误差？

1 个答案: