我正在尝试使用自定义概率密度函数拟合一些实验值的分布。显然,结果函数的积分应始终等于1,但简单的scipy.optimize.curve_fit(function,dataBincenters,dataCounts)的结果永远不会满足这个条件。 解决这个问题的最佳方法是什么?
答案 0 :(得分:16)
您可以定义自己的残差函数,包括惩罚参数,如下面的代码中详细说明的那样,事先知道沿着区间的积分必须是2.
。如果你在没有受到惩罚的情况下进行测试,你会看到你得到的是传统的curve_fit
:
import matplotlib.pyplot as plt
import scipy
from scipy.optimize import curve_fit, minimize, leastsq
from scipy.integrate import quad
from scipy import pi, sin
x = scipy.linspace(0, pi, 100)
y = scipy.sin(x) + (0. + scipy.rand(len(x))*0.4)
def func1(x, a0, a1, a2, a3):
return a0 + a1*x + a2*x**2 + a3*x**3
# here you include the penalization factor
def residuals(p,x,y):
integral = quad( func1, 0, pi, args=(p[0],p[1],p[2],p[3]))[0]
penalization = abs(2.-integral)*10000
return y - func1(x, p[0],p[1],p[2],p[3]) - penalization
popt1, pcov1 = curve_fit( func1, x, y )
popt2, pcov2 = leastsq(func=residuals, x0=(1.,1.,1.,1.), args=(x,y))
y_fit1 = func1(x, *popt1)
y_fit2 = func1(x, *popt2)
plt.scatter(x,y, marker='.')
plt.plot(x,y_fit1, color='g', label='curve_fit')
plt.plot(x,y_fit2, color='y', label='constrained')
plt.legend(); plt.xlim(-0.1,3.5); plt.ylim(0,1.4)
print 'Exact integral:',quad(sin ,0,pi)[0]
print 'Approx integral1:',quad(func1,0,pi,args=(popt1[0],popt1[1],
popt1[2],popt1[3]))[0]
print 'Approx integral2:',quad(func1,0,pi,args=(popt2[0],popt2[1],
popt2[2],popt2[3]))[0]
plt.show()
#Exact integral: 2.0
#Approx integral1: 2.60068579748
#Approx integral2: 2.00001911981
其他相关问题:
答案 1 :(得分:7)
这是一个几乎相同的代码段,仅使用curve_fit
。
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as opt
import scipy.integrate as integr
x = np.linspace(0, np.pi, 100)
y = np.sin(x) + (0. + np.random.rand(len(x))*0.4)
def Func(x, a0, a1, a2, a3):
return a0 + a1*x + a2*x**2 + a3*x**3
# modified function definition with Penalization
def FuncPen(x, a0, a1, a2, a3):
integral = integr.quad( Func, 0, np.pi, args=(a0,a1,a2,a3))[0]
penalization = abs(2.-integral)*10000
return a0 + a1*x + a2*x**2 + a3*x**3 + penalization
popt1, pcov1 = opt.curve_fit( Func, x, y )
popt2, pcov2 = opt.curve_fit( FuncPen, x, y )
y_fit1 = Func(x, *popt1)
y_fit2 = Func(x, *popt2)
plt.scatter(x,y, marker='.')
plt.plot(x,y_fit2, color='y', label='constrained')
plt.plot(x,y_fit1, color='g', label='curve_fit')
plt.legend(); plt.xlim(-0.1,3.5); plt.ylim(0,1.4)
print 'Exact integral:',integr.quad(np.sin ,0,np.pi)[0]
print 'Approx integral1:',integr.quad(Func,0,np.pi,args=(popt1[0],popt1[1],
popt1[2],popt1[3]))[0]
print 'Approx integral2:',integr.quad(Func,0,np.pi,args=(popt2[0],popt2[1],
popt2[2],popt2[3]))[0]
plt.show()
#Exact integral: 2.0
#Approx integral1: 2.66485028754
#Approx integral2: 2.00002116217
答案 2 :(得分:1)
如果您能够提前规范化概率拟合函数,那么您可以使用此信息来约束您的拟合。一个非常简单的例子就是将Gaussian拟合到数据中。如果一个人适合下面的三参数(A,mu,sigma)高斯,那么它通常会被非标准化:
但是,如果一个人在A:
上强制执行规范化条件
然后高斯只是两个参数并自动标准化。
答案 3 :(得分:1)
下面的示例是添加任何约束的更通用的方法:
from scipy.optimize import minimize
from scipy.integrate import quad
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, np.pi, 100)
y = np.sin(x) + (0. + np.random.rand(len(x))*0.4)
def func_to_fit(x, params):
return params[0] + params[1] * x + params[2] * x ** 2 + params[3] * x ** 3
def constr_fun(params):
intgrl, _ = quad(func_to_fit, 0, np.pi, args=(params,))
return intgrl - 2
def func_to_minimise(params, x, y):
y_pred = func_to_fit(x, params)
return np.sum((y_pred - y) ** 2)
# Do the parameter fitting
#without constraints
res1 = minimize(func_to_minimise, x0=np.random.rand(4), args=(x, y))
params1 = res1.x
# with constraints
cons = {'type': 'eq', 'fun': constr_fun}
res2 = minimize(func_to_minimise, x0=np.random.rand(4), args=(x, y), constraints=cons)
params2 = res2.x
y_fit1 = func_to_fit(x, params1)
y_fit2 = func_to_fit(x, params2)
plt.scatter(x,y, marker='.')
plt.plot(x, y_fit2, color='y', label='constrained')
plt.plot(x, y_fit1, color='g', label='curve_fit')
plt.legend(); plt.xlim(-0.1,3.5); plt.ylim(0,1.4)
plt.show()
print(f"Constrant violation: {constr_fun(params1)}")
违反约束:-2.9179325622408214e-10
答案 4 :(得分:0)
您可以确保通过数值积分对拟合的概率分布进行归一化。例如,假设您有数据x
和y
,并且已为参数分布unnormalised_function(x, a, b)
和a
定义了b
,则其概率分布为在x1
到x2
的间隔中定义(可以是无限的):
from scipy.optimize import curve_fit
from scipy.integrate import quad
# Define a numerically normalised function
def normalised_function(x, a, b):
normalisation, _ = quad(lambda x: unnormalised_function(x, a, b), x1, x2)
return unnormalised_function(x, a, b)/normalisation
# Do the parameter fitting
fitted_parameters, _ = curve_fit(normalised_function, x, y)