带有两个变量的python中的非线性曲线拟合

时间:2018-12-03 00:35:05

标签: python data-science model-fitting

我试图定义一个适合以下形式的输入x和y数据的函数:

def nlvh(x,y, xi, yi, H,C):

    return ((H-xi*C)/8.314)*((1/xi) - x) + (C/8.314)*np.log((1/x)/xi) + np.log(yi)

x和y数据是长度相同的一维numpy数组。我想对数据进行切片,以便可以选择x和y的前5个点,通过在模型中优化C和H来拟合它们,然后向前移动一个点并重复。我有一些代码可以对同一数据进行线性拟合:

for i in np.arange(len(x)):
    xdata = x[i:i + window]
    ydata = y[i:i+window]
    a[i], b[i] = np.polyfit(xdata, ydata,1)
    xdata_avg[i] = np.mean(xdata)
    if i == (lenx - window):
        break

,但是对上面定义的方程式做同样的事情似乎有些棘手。 x和y分别显示为自变量和因变量,但是还有xo和yo参数,它们是每个窗口中x和y的第一个值。

我想要的最终结果是两个带有H [i]和C [i]的新数组,其中i指定每个后续窗口。有人对我如何入门有什么见识吗?

2 个答案:

答案 0 :(得分:0)

您可以从 scipy.optimize 使用curve_fit。它将使用非线性最小二乘法来拟合函数(H, C, xi, yi)的参数nlvhxy的给定输入数据。

尝试以下代码。在下面提到的代码中,H_arrC_arr是numpy数组,当函数H拟合到以下窗口时,它们分别包含Cnlvh的拟合参数xdataydata的5个连续点(xdataydata是我为xy选择的数组。您可以选择其他数组。)

from __future__ import division #For decimal division.
import numpy as np
from scipy.optimize import curve_fit

def nlvh(x, H, C, xi, yi):
    return ((H-xi*C)/8.314)*((1/xi) - x) + (C/8.314)*np.log((1/x)/xi) + np.log(yi)

xdata = np.arange(1,21) #Choose an array for x

#Find an array yy for chosen values of parameters (H, C, xi, yi)
yy = nlvh(xdata, H=1.0, C=1.0, xi=1.0, yi=1.0)  

print yy
>>>[ 0. -0.08337108 -0.13214004 -0.16674217 -0.19358166 -0.21551112 -0.23405222 -0.25011325 -0.26428008 -0.27695274 -0.28841656 -0.2988822 -0.30850967 -0.3174233  -0.3257217  -0.33348433 -0.3407762 -0.34765116 -0.35415432 -0.36032382]

#Add noise to the initally chosen array yy.
y_noise = 0.2 * np.random.normal(size=xdata.size)
ydata = yy + y_noise    

print ydata

>>>[-0.1404996  -0.04353953  0.35002257  0.12939468 -0.34259184 -0.2906065 -0.37508709 -0.41583238 -0.511851   -0.39465581 -0.32631751 -0.34403938 -0.592997   -0.34312689 -0.4838437  -0.19311436 -0.20962735 -0.31134191-0.09487793 -0.55578775]

H_lst, C_lst = [], []
for i in range( len(xdata)-5 ):
    #Select 5 consecutive points of xdata (from index i to i+4).
    xnew = xdata[i: i+5]

    #Select 5 consecutive points of ydata (from index i to i+4).
    ynew = ydata[i: i+5]

    #Fit function nlvh to data using scipy.optimize.curve_fit
    popt, pcov = curve_fit(nlvh, xnew, ynew, maxfev=100000)

    #Optimal values for H from minimization of sum of the squared residuals.
    H_lst += [popt[0]] 

    #Optimal values for C from minimization of sum of the squared residuals.
    C_lst += [popt[1]]   

H_arr, C_arr = np.asarray(H_lst), np.asarray(C_lst) #Convert list to numpy arrays.

以下是您为H_arrC_arr选择的值的xdataydata的输出。

print H_arr

>>>[ -11.5317468   -18.44101926   20.30837781   31.47360697  -14.45018355 24.17226837   39.96761325   15.28776756 -113.15255865   15.71324201 51.56631241  159.38292301  -28.2429133   -60.97509922  -89.48216973]

print C_arr

>>>[0.70339652 0.34734507 0.2664654  0.2062776  0.30740565 0.19066498 0.1812445  0.30169133 0.11654544 0.21882872 0.11852967 0.09968506 0.2288574  0.128909   0.11658227]

答案 1 :(得分:0)

在对我以前的回答发表评论后(您建议您将xiyi设为每个“切片” xy中的初始值数组),我要添加另一个答案。此答案引入了对函数nlvh的更改,并完全实现了您想要的功能。就像我之前的答案一样,我们将使用 scipy.optimize 中的curve_fit

在下面提到的代码中,我正在使用python的globals()函数来定义xiyi。对于每个切片的xy数组,xiyi存储各自切片的数组的第一个值。这是经过修改的代码:

from __future__ import division #For decimal division.
import numpy as np
from scipy.optimize import curve_fit

def nlvh(x, H, C):
    return ((H-xi*C)/8.314)*((1/xi) - x) + (C/8.314)*np.log((1/x)/xi) + np.log(yi)

xdata = np.arange(1,21) #Choose an array for x.

#Choose an array for y.
ydata = np.array([-0.1404996,  -0.04353953,  0.35002257,  0.12939468, -0.34259184, -0.2906065,
     -0.37508709, -0.41583238, -0.511851,   -0.39465581, -0.32631751, -0.34403938,
     -0.592997,   -0.34312689, -0.4838437,  -0.19311436, -0.20962735, -0.31134191, 
     -0.09487793, -0.55578775])


H_lst, C_lst = [], []
for i in range( len(xdata)-5 ):
    #Select 5 consecutive points of xdata (from index i to i+4).
    xnew = xdata[i: i+5]
    globals()['xi'] = xnew[0]

    #Select 5 consecutive points of ydata (from index i to i+4).
    ynew = ydata[i: i+5]
    globals()['yi'] = ynew[0]  

    #Fit function nlvh to data using scipy.optimize.curve_fit
    popt, pcov = curve_fit(nlvh, xnew, ynew, maxfev=100000)

    #Optimal values for H from minimization of sum of the squared residuals.
    H_lst += [popt[0]] 

    #Optimal values for C from minimization of sum of the squared residuals.
    C_lst += [popt[1]] 

H_arr, C_arr = np.asarray(H_lst), np.asarray(C_lst) #Convert list to numpy arrays.

您对H_arrC_arr的输出现在如下:

print H_arr 
>>>[1.0, 1.0, -23.041138662879327, -34.58915200575536, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

print C_arr
>>>[1.0, 1.0, -8.795855063863234, -9.271561975595562, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

以下是针对上面选择的数据(xdataydata)获得的图。

Plot of parameter H Plot of parameter C