我正在尝试创建一个函数,该函数采用一组观察到的和预期的数据点,确定用于校准的最佳函数,并将此校准应用于整个数据集(数据点是子集)。但是,我想确保scipy.optimize.curve_fit
或numpy.polyfit
拟合的多项式函数是单调的(一阶导数不会更改符号)。
为此,我目前拥有以下测试代码:
#! /usr/bin/env python
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import numpy as np
import math
import sys
import random
random.seed(28)
#############
# Test Data #
#############
measuredMaximaMZ = [100.0, 201.0, 222.0, 401.0, 500.0]
presentCalibrants = [100.5, 200.5, 300.5, 400.5, 500.5]
Fake_Raw_X = np.linspace(100,500,1000)
Fake_Raw_Y = random.sample(xrange(100000), 1000)
#############
# Functions #
#############
def powerLaw(x,a,b,c):
penalty = 0
if b > 2.:
penalty = abs(b-1.)*10000
if b < 0.:
penalty = abs(2.-b)*10000
return a*x**b + c + penalty
class powerLawCall:
def __init__(self,x,a,b,c):
self.a = a
self.b = b
self.c = c
self.f = lambda x: a*x**b+c
def __call__(self,x):
return self.f(x)
def describe(self):
return str(self.a)+"*X^"+str(self.b)+"+"+str(self.c)
def performCalibration(measured, expected):
RMS = sys.maxint
func = None
# Power Law
z = curve_fit(powerLaw, measured, expected)
RMS_buffer = []
for index, i in enumerate(measured):
RMS_buffer.append((powerLaw(i,*z[0])-expected[index])**2)
RMS_buffer = np.mean(RMS_buffer)
RMS_buffer = math.sqrt(RMS_buffer)
if RMS_buffer < RMS:
RMS = RMS_buffer
func = powerLawCall(0,*z[0])
# Polynomials between 1 and len(expected)
for i in range(1,len(expected)):
RMS_buffer = []
z = np.polyfit(measured, expected, i)
f = np.poly1d(z)
for index, j in enumerate(measured):
RMS_buffer.append((f(j) - expected[index])**2)
RMS_buffer = np.mean(RMS_buffer)
RMS_buffer = math.sqrt(RMS_buffer)
if RMS_buffer < RMS:
RMS = RMS_buffer
func = f
return func
#############
# Test Main #
#############
f = performCalibration(measuredMaximaMZ, presentCalibrants)
if isinstance(f,powerLawCall):
label = f.describe()
else:
label = ""
for index,i in enumerate(f):
if index < len(f):
label += "{0:.2e}".format(i)+"x^"+str(len(f)-index)+" + "
else:
label += "{0:.2e}".format(i)
# Write results
with open("UPC Results.txt",'w') as fw:
fw.write("Function: "+str(label)+"\n")
fw.write("Expected\tOriginal\tCalibrated\n")
for index, i in enumerate(measuredMaximaMZ):
fw.write(str(presentCalibrants[index])+"\t"+str(i)+"\t"+str(f(i))+"\n")
# Plotting
X_data = []
Y_data = []
for index, i in enumerate(measuredMaximaMZ):
X_data.append(presentCalibrants[index])
Y_data.append(f(i))
newX = np.linspace(X_data[0],X_data[-1],1000)
newY = f(newX)
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(211)
plt.scatter(X_data,measuredMaximaMZ,c='b',label='Raw',marker='s',alpha=0.5)
plt.scatter(X_data,Y_data,c='r',label='Calibrated',marker='s',alpha=0.5)
plt.plot(newX,newY,label="Fit, Function: "+str(label))
plt.legend(loc='best')
plt.title("UPC Test")
plt.xlabel("Expected X")
plt.ylabel("Observed X")
ax = fig.add_subplot(212)
plt.plot(Fake_Raw_X,Fake_Raw_Y,c='b')
plt.plot(f(Fake_Raw_X),Fake_Raw_Y,c='r')
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
对于列出的测试数据(图片中的第一幅图),将产生以下非单调校准曲线(和数据点):
问题出在X区域后校准区域300-450附近,如这张图片的第二幅图所示(红色是后校准区域,在该范围内X值具有多个Y值):
更新
此后,我发现了如何根据this问题使用curve_fit的bounds
部分来指定函数的界限。那么具体的问题是,是否有一种方法可以限制curve_fit
或polyfit
仅使用单调(多项式)函数(它返回的函数的导数在整个指定区域内具有相同的符号)。
答案 0 :(得分:1)
单调内插器作为PchipInterpolator和Akima1DInterpolator提供。 Dierckx的FITPACK具有一些符合约束的样条曲线,但并未暴露给python。