(对不起成本函数可能是错误的术语)
我正在尝试拟合一些质谱数据,我知道大多数峰值是什么,但不是全部。因此,当我完成时,我预计会有一个积极的残余。看起来大多数scipy拟合算法只是试图最小化rms残差,所以我没有看到一种惩罚负残差的方法,而不是正面。
为了清楚起见,这里有一些简单的代码:
import numpy as np
from scipy import optimize
import matplotlib.pyplot as plt
def gaussian(x, a, x0, sigma):
return a * np.exp(-(x - x0)**2.0 / (2.0 * sigma**2))
def get_y_with_noise(x, peaks):
"""Return your input spectrum with some lazy shot like noise."""
n = len(x)
y = np.zeros(n)
for peak in peaks:
y += gaussian(x, *peak)
# Add some noise.
y += np.random.randn(n) * 0.05 * np.sqrt(y)
y[np.where(y < 0.0)] = 0.0
return y
def fit_peaks_and_plot(x, peaks):
"""Generate the fit some peaks."""
y = get_y_with_noise(x, peaks)
# Make a really good guess of starting params.
p0 = peaks[0]
# Fit the data.
popt, pcov = optimize.curve_fit(gaussian, x, y, p0=p0)
# Plot residuals. Look works great.
plt.figure()
plt.plot(x, y)
plt.plot(x, gaussian(x, *popt))
plt.plot(x, y - gaussian(x, *popt))
plt.show()
# Define our data range.
x = np.arange(-10.0, 10.0, 0.01)
# Some peaks that separate nicely.
peaks_1 = [[1.0, 0.0, 1.0], [0.5, 5.0, 1.0]]
# Some peaks that are too close for comforter.
peaks_2 = [[1.0, 0.0, 1.0], [0.5, 2.0, 1.0]]
# Set up some peak.
fit_peaks_and_plot(x, peaks_1)
fit_peaks_and_plot(x, peaks_2)
第一组峰很好地分开。 第二组峰值重叠,因此我们尝试使用高斯拟合非高斯,并留下显着的负残差。
我想添加修改成本函数来惩罚负面残差,然后再积极。
我相信在我的例子中,curve_fit试图最小化:
np.sum( ((f(xdata, *popt) - ydata) / sigma)**2 )
作为玩具模型,您可以尝试最小化:
weighted_res = (f(xdata, *popt) - ydata) / sigma
weighted_res[np.where(weighted_res < 0.0)] *= 10.0
np.sum(weighted_res)
显然,我可以定义一个返回weighted_res并尝试将其调整为零的函数,但这似乎是一种非常圆的方法。