将curve_fit或polyfit限制为单调函数

时间:2018-07-09 11:35:54

标签: python scipy curve-fitting polynomials

我正在尝试创建一个函数,该函数采用一组观察到的和预期的数据点,确定用于校准的最佳函数,并将此校准应用于整个数据集(数据点是子集)。但是,我想确保scipy.optimize.curve_fitnumpy.polyfit拟合的多项式函数是单调的(一阶导数不会更改符号)。

为此,我目前拥有以下测试代码:

#! /usr/bin/env python
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import numpy as np
import math
import sys
import random
random.seed(28)

#############
# Test Data #
#############
measuredMaximaMZ = [100.0, 201.0, 222.0, 401.0, 500.0]
presentCalibrants = [100.5, 200.5, 300.5, 400.5, 500.5]

Fake_Raw_X = np.linspace(100,500,1000)
Fake_Raw_Y = random.sample(xrange(100000), 1000)

#############
# Functions #
#############
def powerLaw(x,a,b,c):
    penalty = 0
    if b > 2.:
        penalty = abs(b-1.)*10000
    if b < 0.:
        penalty = abs(2.-b)*10000
    return a*x**b + c + penalty

class powerLawCall:
    def __init__(self,x,a,b,c):
        self.a = a
        self.b = b
        self.c = c
        self.f = lambda x: a*x**b+c
    def __call__(self,x):
        return self.f(x)
    def describe(self):
        return str(self.a)+"*X^"+str(self.b)+"+"+str(self.c)

def performCalibration(measured, expected):
    RMS = sys.maxint
    func = None

    # Power Law
    z = curve_fit(powerLaw, measured, expected)
    RMS_buffer = []
    for index, i in enumerate(measured):
        RMS_buffer.append((powerLaw(i,*z[0])-expected[index])**2)
    RMS_buffer = np.mean(RMS_buffer)
    RMS_buffer = math.sqrt(RMS_buffer)
    if RMS_buffer < RMS:
        RMS = RMS_buffer
        func = powerLawCall(0,*z[0])

    # Polynomials between 1 and len(expected)
    for i in range(1,len(expected)):
        RMS_buffer = []
        z = np.polyfit(measured, expected, i)
        f = np.poly1d(z)
        for index, j in enumerate(measured):
            RMS_buffer.append((f(j) - expected[index])**2)
        RMS_buffer = np.mean(RMS_buffer)
        RMS_buffer = math.sqrt(RMS_buffer)
        if RMS_buffer < RMS:
            RMS = RMS_buffer
            func = f
    return func

############# 
# Test Main #
#############
f = performCalibration(measuredMaximaMZ, presentCalibrants)

if isinstance(f,powerLawCall):
    label = f.describe()
else:
    label = ""
    for index,i in enumerate(f):
        if index < len(f):
            label += "{0:.2e}".format(i)+"x^"+str(len(f)-index)+" + "
        else:
            label += "{0:.2e}".format(i)

# Write results
with open("UPC Results.txt",'w') as fw:
    fw.write("Function: "+str(label)+"\n")
    fw.write("Expected\tOriginal\tCalibrated\n")
    for index, i in enumerate(measuredMaximaMZ):
        fw.write(str(presentCalibrants[index])+"\t"+str(i)+"\t"+str(f(i))+"\n")

# Plotting
X_data = []
Y_data = []
for index, i in enumerate(measuredMaximaMZ):
    X_data.append(presentCalibrants[index])
    Y_data.append(f(i))
newX = np.linspace(X_data[0],X_data[-1],1000)
newY = f(newX)

fig =  plt.figure(figsize=(8,6))
ax = fig.add_subplot(211)
plt.scatter(X_data,measuredMaximaMZ,c='b',label='Raw',marker='s',alpha=0.5)
plt.scatter(X_data,Y_data,c='r',label='Calibrated',marker='s',alpha=0.5)
plt.plot(newX,newY,label="Fit, Function: "+str(label))
plt.legend(loc='best')
plt.title("UPC Test")
plt.xlabel("Expected X")
plt.ylabel("Observed X")

ax = fig.add_subplot(212)
plt.plot(Fake_Raw_X,Fake_Raw_Y,c='b')
plt.plot(f(Fake_Raw_X),Fake_Raw_Y,c='r')
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

对于列出的测试数据(图片中的第一幅图),将产生以下非单调校准曲线(和数据点):

enter image description here

问题出在X区域后校准区域300-450附近,如这张图片的第二幅图所示(红色是后校准区域,在该范围内X值具有多个Y值):

enter image description here

更新

此后,我发现了如何根据this问题使用curve_fit的bounds部分来指定函数的界限。那么具体的问题是,是否有一种方法可以限制curve_fitpolyfit仅使用单调(多项式)函数(它返回的函数的导数在整个指定区域内具有相同的符号)。

1 个答案:

答案 0 :(得分:1)

单调内插器作为PchipInterpolator和Akima1DInterpolator提供。 Dierckx的FITPACK具有一些符合约束的样条曲线,但并未暴露给python。